Planet Musings

June 20, 2013

Terence TaoThe prime tuples conjecture, sieve theory, and the work of Goldston-Pintz-Yildirim, Motohashi-Pintz, and Zhang

Suppose one is given a {k_0}-tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})} of {k_0} distinct integers for some {k_0 \geq 1}, arranged in increasing order. When is it possible to find infinitely many translates {n + {\mathcal H} =(n+h_1,\ldots,n+h_{k_0})} of {{\mathcal H}} which consists entirely of primes? The case {k_0=1} is just Euclid’s theorem on the infinitude of primes, but the case {k_0=2} is already open in general, with the {{\mathcal H} = (0,2)} case being the notorious twin prime conjecture.

On the other hand, there are some tuples {{\mathcal H}} for which one can easily answer the above question in the negative. For instance, the only translate of {(0,1)} that consists entirely of primes is {(2,3)}, basically because each translate of {(0,1)} must contain an even number, and the only even prime is {2}. More generally, if there is a prime {p} such that {{\mathcal H}} meets each of the {p} residue classes {0 \hbox{ mod } p, 1 \hbox{ mod } p, \ldots, p-1 \hbox{ mod } p}, then every translate of {{\mathcal H}} contains at least one multiple of {p}; since {p} is the only multiple of {p} that is prime, this shows that there are only finitely many translates of {{\mathcal H}} that consist entirely of primes.

To avoid this obstruction, let us call a {k_0}-tuple {{\mathcal H}} admissible if it avoids at least one residue class {\hbox{ mod } p} for each prime {p}. It is easy to check for admissibility in practice, since a {k_0}-tuple is automatically admissible in every prime {p} larger than {k_0}, so one only needs to check a finite number of primes in order to decide on the admissibility of a given tuple. For instance, {(0,2)} or {(0,2,6)} are admissible, but {(0,2,4)} is not (because it covers all the residue classes modulo {3}). We then have the famous Hardy-Littlewood prime tuples conjecture:

Conjecture 1 (Prime tuples conjecture, qualitative form) If {{\mathcal H}} is an admissible {k_0}-tuple, then there exists infinitely many translates of {{\mathcal H}} that consist entirely of primes.

This conjecture is extremely difficult (containing the twin prime conjecture, for instance, as a special case), and in fact there is no explicitly known example of an admissible {k_0}-tuple with {k_0 \geq 2} for which we can verify this conjecture (although, thanks to the recent work of Zhang, we know that {(0,d)} satisfies the conclusion of the prime tuples conjecture for some {0 < d < 70,000,000}, even if we can’t yet say what the precise value of {d} is).

Actually, Hardy and Littlewood conjectured a more precise version of Conjecture 1. Given an admissible {k_0}-tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})}, and for each prime {p}, let {\nu_p = \nu_p({\mathcal H}) := |{\mathcal H} \hbox{ mod } p|} denote the number of residue classes modulo {p} that {{\mathcal H}} meets; thus we have {1 \leq \nu_p \leq p-1} for all {p} by admissibility, and also {\nu_p = k_0} for all {p>h_{k_0}-h_1}. We then define the singular series {{\mathfrak G} = {\mathfrak G}({\mathcal H})} associated to {{\mathcal H}} by the formula

\displaystyle  {\mathfrak G} := \prod_{p \in {\mathcal P}} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}}

where {{\mathcal P} = \{2,3,5,\ldots\}} is the set of primes; by the previous discussion we see that the infinite product in {{\mathfrak G}} converges to a finite non-zero number.

We will also need some asymptotic notation (in the spirit of “cheap nonstandard analysis“). We will need a parameter {x} that one should think of going to infinity. Some mathematical objects (such as {{\mathcal H}} and {k_0}) will be independent of {x} and referred to as fixed; but unless otherwise specified we allow all mathematical objects under consideration to depend on {x}. If {X} and {Y} are two such quantities, we say that {X = O(Y)} if one has {|X| \leq CY} for some fixed {C}, and {X = o(Y)} if one has {|X| \leq c(x) Y} for some function {c(x)} of {x} (and of any fixed parameters present) that goes to zero as {x \rightarrow \infty} (for each choice of fixed parameters).

Conjecture 2 (Prime tuples conjecture, quantitative form) Let {k_0 \geq 1} be a fixed natural number, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then the number of natural numbers {n < x} such that {n+{\mathcal H}} consists entirely of primes is {({\mathfrak G} + o(1)) \frac{x}{\log^{k_0} x}}.

Thus, for instance, if Conjecture 2 holds, then the number of twin primes less than {x} should equal {(2 \Pi_2 + o(1)) \frac{x}{\log^2 x}}, where {\Pi_2} is the twin prime constant

\displaystyle  \Pi_2 := \prod_{p \in {\mathcal P}: p>2} (1 - \frac{1}{(p-1)^2}) = 0.6601618\ldots.

As this conjecture is stronger than Conjecture 1, it is of course open. However there are a number of partial results on this conjecture. For instance, this conjecture is known to be true if one introduces some additional averaging in {{\mathcal H}}; see for instance this previous post. From the methods of sieve theory, one can obtain an upper bound of {(C_{k_0} {\mathfrak G} + o(1)) \frac{x}{\log^{k_0} x}} for the number of {n < x} with {n + {\mathcal H}} all prime, where {C_{k_0}} depends only on {k_0}. Sieve theory can also give analogues of Conjecture 2 if the primes are replaced by a suitable notion of almost prime (or more precisely, by a weight function concentrated on almost primes).

Another type of partial result towards Conjectures 1, 2 come from the results of Goldston-Pintz-Yildirim, Motohashi-Pintz, and of Zhang. Following the notation of this recent paper of Pintz, for each {k_0>2}, let {DHL[k_0,2]} denote the following assertion (DHL stands for “Dickson-Hardy-Littlewood”):

Conjecture 3 ({DHL[k_0,2]}) Let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then there are infinitely many translates {n+{\mathcal H}} of {{\mathcal H}} which contain at least two primes.

This conjecture gets harder as {k_0} gets smaller. Note for instance that {DHL[2,2]} would imply all the {k_0=2} cases of Conjecture 1, including the twin prime conjecture. More generally, if one knew {DHL[k_0,2]} for some {k_0}, then one would immediately conclude that there are an infinite number of pairs of consecutive primes of separation at most {H(k_0)}, where {H(k_0)} is the minimal diameter {h_{k_0}-h_1} amongst all admissible {k_0}-tuples {{\mathcal H}}. Values of {H(k_0)} for small {k_0} can be found at this link (with {H(k_0)} denoted {w} in that page). For large {k_0}, the best upper bounds on {H(k_0)} have been found by using admissible {k_0}-tuples {{\mathcal H}} of the form

\displaystyle  {\mathcal H} = ( - p_{m+\lfloor k_0/2\rfloor - 1}, \ldots, - p_{m+1}, -1, +1, p_{m+1}, \ldots, p_{m+\lfloor (k_0+1)/2\rfloor - 1} )

where {p_n} denotes the {n^{th}} prime and {m} is a parameter to be optimised over (in practice it is an order of magnitude or two smaller than {k_0}); see this blog post for details. The upshot is that one can bound {H(k_0)} for large {k_0} by a quantity slightly smaller than {k_0 \log k_0} (and the large sieve inequality shows that this is sharp up to a factor of two, see e.g. this previous post for more discussion).

In a key breakthrough, Goldston, Pintz, and Yildirim were able to establish the following conditional result a few years ago:

Theorem 4 (Goldston-Pintz-Yildirim) Suppose that the Elliott-Halberstam conjecture {EH[\theta]} is true for some {1/2 < \theta < 1}. Then {DHL[k_0,2]} is true for some finite {k_0}. In particular, this establishes an infinite number of pairs of consecutive primes of separation {O(1)}.

The dependence of constants between {k_0} and {\theta} given by the Goldston-Pintz-Yildirim argument is basically of the form {k_0 \sim (\theta-1/2)^{-2}}. (UPDATE: as recently observed by Farkas, Pintz, and Revesz, this relationship can be improved to {k_0 \sim (\theta-1/2)^{-3/2}}.)

Unfortunately, the Elliott-Halberstam conjecture (which we will state properly below) is only known for {\theta<1/2}, an important result known as the Bombieri-Vinogradov theorem. If one uses the Bombieri-Vinogradov theorem instead of the Elliott-Halberstam conjecture, Goldston, Pintz, and Yildirim were still able to show the highly non-trivial result that there were infinitely many pairs {p_{n+1},p_n} of consecutive primes with {(p_{n+1}-p_n) / \log p_n \rightarrow 0} (actually they showed more than this; see e.g. this survey of Soundararajan for details).

Actually, the full strength of the Elliott-Halberstam conjecture is not needed for these results. There is a technical specialisation of the Elliott-Halberstam conjecture which does not presently have a commonly accepted name; I will call it the Motohashi-Pintz-Zhang conjecture {MPZ[\varpi]} in this post, where {0 < \varpi < 1/4} is a parameter. We will define this conjecture more precisely later, but let us remark for now that {MPZ[\varpi]} is a consequence of {EH[\frac{1}{2}+2\varpi]}.

We then have the following two theorems. Firstly, we have the following strengthening of Theorem 4:

Theorem 5 (Motohashi-Pintz-Zhang) Suppose that {MPZ[\varpi]} is true for some {0 < \varpi < 1/4}. Then {DHL[k_0,2]} is true for some {k_0}.

A version of this result (with a slightly different formulation of {MPZ[\varpi]}) appears in this paper of Motohashi and Pintz, and in the paper of Zhang, Theorem 5 is proven for the concrete values {\varpi = 1/1168} and {k_0 = 3,500,000}. We will supply a self-contained proof of Theorem 5 below the fold, the constants upon those in Zhang’s paper (in particular, for {\varpi = 1/1168}, we can take {k_0} as low as {341,640}, with further improvements on the way). As with Theorem 4, we have an inverse quadratic relationship {k_0 \sim \varpi^{-2}}.

In his paper, Zhang obtained for the first time an unconditional advance on {MPZ[\varpi]}:

Theorem 6 (Zhang) {MPZ[\varpi]} is true for all {0 < \varpi \leq 1/1168}.

This is a deep result, building upon the work of Fouvry-Iwaniec, Friedlander-Iwaniec and Bombieri-Friedlander-Iwaniec which established results of a similar nature to {MPZ[\varpi]} but simpler in some key respects. We will not discuss this result further here, except to say that they rely on the (higher-dimensional case of the) Weil conjectures, which were famously proven by Deligne using methods from l-adic cohomology. Also, it was believed among at least some experts that the methods of Bombieri, Fouvry, Friedlander, and Iwaniec were not quite strong enough to obtain results of the form {MPZ[\varpi]}, making Theorem 6 a particularly impressive achievement.

Combining Theorem 6 with Theorem 5 we obtain {DHL[k_0,2]} for some finite {k_0}; Zhang obtains this for {k_0 = 3,500,000} but as detailed below, this can be lowered to {k_0 = 341,640}. This in turn gives infinitely many pairs of consecutive primes of separation at most {H(k_0)}. Zhang gives a simple argument that bounds {H(3,500,000)} by {70,000,000}, giving his famous result that there are infinitely many pairs of primes of separation at most {70,000,000}; by being a bit more careful (as discussed in this post) one can lower the upper bound on {H(3,500,000)} to {57,554,086}, and if one instead uses the newer value {k_0 = 341,640} for {k_0} one can instead use the bound {H(341,640) \leq 4,982,086}. (Many thanks to Scott Morrison for these numerics.) UPDATE: These values are now obsolete; see this web page for the latest bounds.

In this post we would like to give a self-contained proof of both Theorem 4 and Theorem 5, which are both sieve-theoretic results that are mainly elementary in nature. (But, as stated earlier, we will not discuss the deepest new result in Zhang’s paper, namely Theorem 6.) Our presentation will deviate a little bit from the traditional sieve-theoretic approach in a few places. Firstly, there is a portion of the argument that is traditionally handled using contour integration and properties of the Riemann zeta function; we will present a “cheaper” approach (which Ben Green and I used in our papers, e.g. in this one) using Fourier analysis, with the only property used about the zeta function {\zeta(s)} being the elementary fact that blows up like {\frac{1}{s-1}} as one approaches {1} from the right. To deal with the contribution of small primes (which is the source of the singular series {{\mathfrak G}}), it will be convenient to use the “{W}-trick” (introduced in this paper of mine with Ben), passing to a single residue class mod {W} (where {W} is the product of all the small primes) to end up in a situation in which all small primes have been “turned off” which leads to better pseudorandomness properties (for instance, once one eliminates all multiples of small primes, almost all pairs of remaining numbers will be coprime).

— 1. The {W}-trick —

In this section we introduce the “{W}-trick”, which is a simple but useful device that automatically takes care of local factors arising from small primes, such as the singular series {{\mathfrak G}}. The price one pays for this trick is that the explicit decay rates in various {o(1)} terms can be rather poor, but for the applications here, we will not need to know any information on these decay rates and so the {W}-trick may be freely applied.

Let {w} be a natural number, which should be thought of as either fixed and large, or as a very slowly growing function of {x}. Actually, the two viewpoints are basically equivalent for the purposes of asymptotic analysis (at least at the qualitative level of {o(1)} decay rates), thanks to the following basic principle:

Lemma 7 (Overspill principle) Let {F(w,x)} be a quantity depending on {w} and {x}. Then the following are equivalent:

  • (i) For every fixed {\epsilon>0} there exists a fixed {w_\epsilon > 0} such that

    \displaystyle  |F(w,x)| \leq \epsilon + o(1)

    for all fixed {w \geq w_\epsilon}.

  • (ii) We have

    \displaystyle  F(w,x) = o(1)

    whenever {w = w(x)} is a function of {x} going to infinity that is sufficiently slowly growing. (In other words, there exists a function {w_0: {\bf R}^+ \rightarrow {\bf N}} going to infinity with the property that {F(w,x)=o(1)} whenever {w = w(x)} is a natural number-valued function of {x} is such that {w(x) \rightarrow \infty} as {x \rightarrow \infty} and {w(x) \leq w_0(x)} for all sufficiently large {x}.)

This principle is closely related to the overspill principle from nonstandard analysis, though we will not explicitly adopt a nonstandard perspective here. It is also similar in spirit to the diagonalisation trick used to prove the Arzela-Ascoli theorem.

Proof: We first show that (i) implies (ii). By (i), we see that for every natural number {n}, we can find a real number {x_n} with the property that

\displaystyle  |F(w,x)| \leq \frac{2}{m}

whenever {1 \leq m \leq n}, {1 \leq w \leq n}, and {x \geq x_n} are such that {w \geq w_{1/m}}. By increasing the {x_n} as necessary we may assume that they are increasing and go to infinity as {n \rightarrow \infty}. If we then define {w_0(x)} to equal the largest natural number {n} for which {x \geq x_n}, or equal to {1} if no such number exists, then one easily veifies that {F(w,x)=o(1)} whenever {w= w(x)} goes to infinity and is bounded by {w_0} for sufficiently large {x}.

Now we show that (ii) implies (i). Suppose for contradiction that (i) failed, then we can find a fixed {\epsilon>0} with the property that for any natural number {n}, there exist {w_n \geq n} such that {|F(w_n,x_n)| \geq \epsilon} for arbitrarily large {x_n}. We can select the {w_n} to be increasing to infinity, and then we can find a sequence {x_n} increasing to infinity such that {|F(w_n,x_n)| \geq \epsilon} for all {n}; by increasing {x_n} as necessary, we can also ensure that {w_0(x) \geq w_n} for all {x \geq x_n} and {n}. If we then define {w(x)} to be {w_n} when {x_n \leq x < x_{n+1}}, and {w(x)=1} for {x < x_1}, we see that {|F(w,x)| \geq \epsilon} whenever {x=x_n}, contradicting (ii). \Box

Henceforth we will usually think of {w} as a sufficiently slowly growing function of {x}, although we will on occasion take advantage of Lemma 7 to switch to thinking of {w} as a large fixed quantity instead. In either case, we should think of {w} as exceeding the size of fixed quantities such as {k} or {h_k-h_1}, at least in the limit where {x} is large; in particular, for a fixed {k_0}-tuple {{\mathcal H}}, we will have

\displaystyle  \nu_p = k_0 \hbox{ for all } p > w \ \ \ \ \ (1)

if {x} is large enough. A particular consequence of the growing nature of {w} is that

\displaystyle  \sum_{p > w} \frac{1}{p^2} = o(1) \ \ \ \ \ (2)

as this follows from the absolutely convergent nature of the sum {\sum_{n=1}^\infty \frac{1}{n^2}} and hence also {\sum_p \frac{1}{p^2}}. As a consequence of this, once we “turn off” all the primes less than {w}, any errors in our sieve-theoretic analysis which are quadratic or higher in {1/p} can be essentially ignored, which will be very convenient for us. In a similar vein, for any fixed {k_0}-tuple {{\mathcal H}}, one has

\displaystyle  \prod_{p>w} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}} = 1+o(1) \ \ \ \ \ (3)

which allows one to truncate the singular series:

\displaystyle  {\mathfrak G} = \prod_{p \leq w} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}} + o(1). \ \ \ \ \ (4)

In order to “turn off” all the small primes, we introduce the quantity {W}, defined as the product of all the primes up to {w} (i.e. the primorial of {w}):

\displaystyle  W := \prod_{p \leq w} p.

As {w} is going to infinity, {W} is going to infinity also (but as slowly as we please). The idea of the {W}-trick is to search for prime patterns in a single residue class {b \hbox{ mod } W}, which as mentioned earlier will “turn off” all the primes less than {w} in the sieve-theoretic analysis.

Using (4) and the Chinese remainder theorem, we may thus approximate the singular series as

\displaystyle  {\mathfrak G} = \frac{|C(W)|}{W} (\frac{\phi(W)}{W})^{-k_0} + o(1) \ \ \ \ \ (5)

where {\phi(W)} is the Euler totient function of {W}, and {C(W) \subset {\bf Z}/W{\bf Z}} is the set of residue classes {b \hbox{ mod } W} such that all of the shifts {b+h_1,\ldots,b+h_{k_0}} are coprime to {W}. Note that if {n+{\mathcal H}} consists purely of primes and {n} is sufficiently large, then {n} must lie in one of the residue classes in {C(W)}. Thus we can count tuples with {n+{\mathcal H}} all prime by working in each residue class in {C(W)} separately. We conclude that Conjecture 2 is equivalent to the following “{W}-tricked version” in which the singular series is no longer present (or, more precisely, has been replaced by some natural normalisation factors depending on {W}, such as {(\phi(W)/W)^{-k_0}}):

Conjecture 8 (Prime tuples conjecture, W-tricked quantitative form) Let {k_0 \geq 1} be a fixed natural number, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Assume {w} is a sufficiently slowly growing function of {x}. Then for any residue class {b \hbox{ mod } W} in {C(W)}, the number of natural numbers {n < x} with {n=b \hbox{ mod } W} such that {n+{\mathcal H}} consists entirely of primes is {(\frac{1}{W} (\frac{W}{\phi(W)})^{k_0} + o(1)) \frac{x}{\log^{k_0} x}}.

We will work with similarly {W}-tricked asymptotics in the analysis below.

— 2. Sums of multiplicative functions —

As a result of the sieve-theoretic computations to follow, we will frequently need to estimate sums of the form

\displaystyle  S_{0,R,I}( f, g ) := \sum_{d \in {\mathcal S}_I} \frac{f(d)}{d} g( \frac{\log d}{\log R} )

and

\displaystyle  S_{1,R,I}( f, g ) := \sum_{d \in {\mathcal S}_I} \mu(d) \frac{f(d)}{d} g( \frac{\log d}{\log R} )

where {f: {\bf N} \rightarrow {\bf C}} is a multiplicative function, the sieve level {R>0} (also denoted {D} in some literature) is a fixed power of {x} (such as {x^{1/4}} or {x^{1/4+\varpi}}), {\mu} is the Möbius function, {g: {\bf R} \rightarrow {\bf C}} is a fixed smooth compactly supported function, {I} is a (possibly half-infinite) interval in {(w,+\infty)}, and {{\mathcal S}_I} is the set of square-free numbers that are products {p_1 \ldots p_j} of distinct primes {p_1,\ldots,p_j} in {I}. (Actually, in applications {g} won’t quite be smooth, but instead have some high order of differentiability (e.g. {k_0+l_0-1} times continuously differentiable for some {l_0>0}), but we can extend the analysis of smooth {g} to sufficiently differentiable {g} by various standard limiting or approximation arguments which we will not dwell on here.) We will also need to control the more complicated variant

\displaystyle  S_{2,R,I}(f,g_1,g_2) := \sum_{d_1,d_2 \in {\mathcal S}_I} \frac{\mu(d_1) \mu(d_2) f([d_1,d_2])}{[d_1,d_2]} g_1( \frac{\log d_1}{\log R} ) g_2( \frac{\log d_2}{\log R} )

where {g_1,g_2:{\bf R} \rightarrow {\bf C}} are also smooth compactly supported functions. In practice, the interval {I} will be something like {(w, x^{1/4+\varpi})}, {(w, x^\varpi)}, {[x^\varpi,x^{1/4+\varpi}]}. In particular, thanks to the {W}-trick we will be able to turn off all the primes up to {w}, so that {I} only contains primes larger than {w}, allowing us to take advantage of bounds such as (2).

Once {d} is restricted to {{\mathcal S}_I}, the quantity {f(d)} is determined entirely by the values of the multiplicative function {f} at primes in {I}:

\displaystyle  f(d) = \prod_{p \in {\mathcal P} \cap I: p | d} f(p).

In applications, {f} will have the size bound

\displaystyle  f(p) = k + O( \frac{1}{p} ) \ \ \ \ \ (6)

for all {p \in I} and some fixed positive {k} (note that we allow the implied constants in the {O()} notation to depend on quantities such as {k}); we refer to {k} as the dimension of the multiplicative function {f}. Henceforth we assume that {f} has a fixed dimension {k}. We remark that we could unify the treatment of {S_{0,R,I}} and {S_{1,R,I}} in what follows by allowing multiplicative functions of negative dimension, but we will avoid doing so here. In our applications {k} will be an integer; one could also generalise much of the discussion below to the fractional dimension case, but we will not need to do so here.

Traditionally the above expressions are handled by complex analysis, starting with Perron’s formula. We will instead take a slightly different Fourier-analytic approach. We perform a Fourier expansion of the smooth compactly supported function {e^x g(x)} to obtain a representation

\displaystyle  e^x g(x) = \int_{\bf R} e^{-itx} \hat g(t)\ dt \ \ \ \ \ (7)

for some Schwartz function {\hat g}; in particular, {\hat g} is rapidly decreasing. (Strictly speaking, {\hat g} is the Fourier transform of {g} shifted in the complex domain by {i}, rather than the true Fourier transform of {g}, but we will ignore this distinction for the purposes of this discussion.) In particular we have

\displaystyle  g(\frac{\log d}{\log R}) = \int_{\bf R} \frac{1}{d^{\frac{1+it}{\log R}}} \hat g(t)\ dt

for any {d}. By Fubini’s theorem, we can thus write {S_{0,R,I}} as

\displaystyle  S_{0,R,I}(f,g) = \int_{\bf R} \sum_{d \in {\mathcal S}_I} \frac{f(d)}{d^{1+\frac{1+it}{\log R}}} \hat g(t)\ dt,

which factorises as

\displaystyle  S_{0,R,I}(f,g) = \int_{\bf R} (\prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt.

Similarly one has

\displaystyle  S_{1,R,I}(f,g) = \int_{\bf R} (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt.

and

\displaystyle  S_{2,R,I}(f,g_1,g_2) = \int_{\bf R} \int_{\bf R} (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{f(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{f(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ))

\displaystyle \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2.

In order to use asymptotics of the Riemann zeta function near the pole {s=1}, it is convenient to temporarily truncate the above integrals to the region {|t| \leq \sqrt{\log R}} or {|t_1|, |t_2| \leq \sqrt{\log R}}:

Lemma 9 For any fixed {A>0}, we have

\displaystyle  S_{0,R,I}(f,g) = \int_{|t| \leq \sqrt{\log R}} (\prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt + O( \log^{-A} R)

and

\displaystyle  S_{1,R,I}(f,g) = \int_{|t| \leq \sqrt{\log R}} (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt + O( \log^{-A} R)

and

\displaystyle  S_{2,R,I}(f,g) = \int_{|t_1|, |t_2| \leq \sqrt{\log R}}

\displaystyle  (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{f(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{f(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ))

\displaystyle  \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2 + O(\log^{-A} R).

Also we have the crude bound

\displaystyle  S_{0,R,I}(f,g) = O( \log^k R ).

Proof: We begin with the bounds on {S_{0,R,I}}. From (6) we have

\displaystyle  \log |1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})| \leq k p^{-1-\frac{1}{\log R}} + O( p^{-2} )

for {p \in I} (which forces {p>w}, so there is no issue with the singularity of the logarithm) and thus

\displaystyle  \prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}}) = O( \exp( k \sum_p p^{-1-\frac{1}{\log R}} ) ).

Since

\displaystyle  \prod_p (1-\frac{1}{p^{1+1/\log R}}) = \frac{1}{\zeta(1+1/\log R)} = \log R + O(1)

we see on taking logarithms that

\displaystyle  \sum_p p^{-1-\frac{1}{\log R}} = \log\log R + O(1)

and thus

\displaystyle  \prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}}) = O( \log^k R ).

The bounds on {S_{0,R,I}(f,g)} then follow from the rapid decrease of {\hat g}. The bounds for {S_{1,R,I}} and {S_{2,R,I}} are proven similarly. \Box

From (6) and the restriction of {I} to quantities larger than {w}, we see that

\displaystyle  (\prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) = (1+o(1)) \zeta_I(1+\frac{1+it}{\log R})^k

and

\displaystyle  (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) = (1+o(1)) \zeta_I(1-\frac{1+it}{\log R})^{-k}

and

\displaystyle  (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{f(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{f(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ))

\displaystyle  = (1+o(1)) \zeta_I(1-\frac{1+it_1}{\log R})^{-k} \zeta_I(1-\frac{1+it_2}{\log R})^{-k}

\displaystyle  \zeta_I(1-\frac{1+it_1+1+it_2}{\log R})^{k}

where {\zeta_I} is the restricted Euler product

\displaystyle  \zeta_I(s) := \prod_{p \in I} (1-\frac{1}{p^s})^{-1},

which is well-defined for {\hbox{Re}(s)>1} at least (and this is the only region of {s} for which we will need {\zeta_I}).

We now specialise to the model case {I = (w,+\infty)}, in which case

\displaystyle  \zeta_I(s) = \zeta(s) \prod_{p \leq w} (1 - \frac{1}{p^s})

where {\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}} is the Riemann zeta function. Using the basic (and easily proven) asymptotic {\zeta(s) = \frac{1}{s-1} + O(1)} for {s} near {1}

\displaystyle  \zeta_I(s) = (1+o(1)) \frac{\phi(W)}{W} \frac{1}{s-1}

for {s = 1+O(1/\sqrt{\log R})}, if {w} is sufficiently slowly growing (this can be seen by first working with a fixed large {W} and then using Lemma 7). Note that because of the above truncation, we do not need any deeper bounds on {\zeta} than what one can obtain from the simple pole at {s=1}; in particular no zero-free regions near the line {\{ 1+it: t \in {\bf R} \}} are needed here. (This is ultimately because of the smooth nature of {g}, which is sufficient for the applications in this post; if one wanted rougher cutoff functions here then the situation is closer to that of the prime number theorem, and non-trivial zero-free regions would be required.)

We conclude in the case {I = (w,+\infty)} that

\displaystyle  S_{0,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^k \int_{|t| \leq \sqrt{\log R}} (1+o(1)) (1+it)^{-k} \hat g(t)\ dt

\displaystyle  + O( \log^{-A} R)

and

\displaystyle  S_{1,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k} \int_{|t| \leq \sqrt{\log R}} (1+o(1)) (1+it)^k \hat g(t)\ dt

\displaystyle + O( \log^{-A} R)

and

\displaystyle  S_{2,R,I}(f,g_1,g_2) = (\frac{\phi(W)}{W} \log R)^{-k} \int_{|t_1|, |t_2| \leq \sqrt{\log R}}

\displaystyle  (1+o(1)) (1+it_1)^k (1+it_2)^k (1+it_1+1+it_2)^{-k} \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2

\displaystyle  + O(\log^{-A} R);

using the rapid decrease of {\hat g, \hat g_1, \hat g_2}, we thus have

\displaystyle  S_{0,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^k (\int_{\bf R} (1+it)^{-k} \hat g(t)\ dt + o(1))

and

\displaystyle  S_{1,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k} (\int_{\bf R} (1+it)^k \hat g(t)\ dt + o(1))

and

\displaystyle  S_{2,R,I}(f,g_1,g_2) = (\frac{\phi(W)}{W} \log R)^{-k} (\int_{\bf R} \int_{\bf R}

\displaystyle  (1+it_1)^k (1+it_2)^k (1+it_1+1+it_2)^{-k}

\displaystyle  \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2 + o(1)).

We can rewrite these expressions in terms of {g} instead of {\hat g}. Using the Gamma function identity

\displaystyle  (1+it)^{-k} = \int_0^\infty e^{-x(1+it)} \frac{x^{k-1}}{(k-1)!}\ dx

and (7) we see that

\displaystyle  \int_{\bf R} (1+it)^{-k} \hat g(t)\ dt = \int_0^\infty g(x) \frac{x^{k-1}}{(k-1)!}\ dx

whilst from differentiating (7) {k} times at the origin (after first dividing by {e^x}) we see that

\displaystyle  \int_{\bf R} (1+it)^k \hat g(t)\ dt = (-1)^k g^{(k)}(0).

Combining these two methods, we also see that

\displaystyle  \int_{\bf R} \int_{\bf R} (1+it_1)^k (1+it_2)^k (1+it_1+1+it_2)^{-k} \hat g_1(t_1) \hat g_1(t_2)\ dt_1 dt_2

\displaystyle  = \int_0^\infty g^{(k)}_1(x) g^{(k)}_2(x) \frac{x^{k-1}}{(k-1)!}\ dx.

We have thus obtained the following asymptotics:

Proposition 10 (Asymptotics without prime truncation) Suppose that {I = (w,+\infty)}, and that {f} has dimension {k} for some fixed natural number {k}. Then we have

\displaystyle  S_{0,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^k (\int_0^\infty g(x) \frac{x^{k-1}}{(k-1)!}\ dx + o(1))

and

\displaystyle  S_{1,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k} ((-1)^k g^{(k)}(0) + o(1))

and

\displaystyle  S_{2,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k}

\displaystyle  (\int_0^\infty g^{(k)}_1(x) g^{(k)}_2(x) \frac{x^{k-1}}{(k-1)!}\ dx + o(1)).

These asymptotics will suffice for the treatment of the Goldston-Pintz-Yildirim theorem (Theorem 4). For the Motohashi-Pintz-Zhang theorem (Theorem 5) we will also need to deal with truncated intervals {I}, such as {(w,x^{1/\varpi})}; we will discuss how to deal with these truncations later.

— 3. The Goldston-Yildirim-Pintz theorem —

We are now ready to state and prove the Goldston-Yildirim-Pintz theorem. We first need to state the Elliott-Halberstam conjecture properly.

Let {\Lambda: {\bf N} \rightarrow {\bf R}} be the von Mangoldt function, thus {\Lambda(n)} equals {\log p} when {n} is equal to a prime {p} or a power of that prime, and equal to zero otherwise. The prime number theorem in arithmetic progressions tells us that

\displaystyle  \sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) = (1 + o(1)) \frac{x}{\phi(q)}

for any fixed arithmetic progression {a \hbox{ mod } q} with {a} coprime to {q}. In particular,

\displaystyle  \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)| = o( \frac{x}{\phi(q)} )

where {({\bf Z}/q{\bf Z})^\times} are the residue classes mod {q} that are coprime to {q}. By invoking the Siegel-Walfisz theorem one can obtain the improvement

\displaystyle  \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)| = O( \frac{x}{\phi(q) \log^A x} )

for any fixed {A>0} (though, annoyingly, the implied constant here is only ineffectively bounded with current methods; see this previous post for further discussion).

The above error term is only useful when {q} is fixed (or is of logarithmic size in {x}). For larger values of {q}, it is very difficult to get good error terms for each {q} separately, unless one assumes powerful hypotheses such as the generalised Riemann hypothesis. However, it is possible to obtain good control on the error term if one averages in {q}. More precisely, for any {0 < \theta < 1}, let {EH[\theta]} denote the following assertion:

Conjecture 11 ({EH[\theta]}) One has

\displaystyle  \sum_{1 \leq q \leq x^\theta} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)|

\displaystyle = O( \frac{x}{\log^A x} )

for all fixed {A>0}.

This should be compared with the asymptotic {\sum_{1 \leq q \leq x^\theta} \frac{x}{\phi(q)} = (C+o(1)) x \log x^\theta} for some absolute constant {C>0}, as can be deduced for instance from Proposition 10. The Elliott-Halberstam conjecture is the assertion that {EH[\theta]} holds for all {0 < \theta < 1}. This remains open, but the important Bombieri-Vinogradov theorem establishes {EH[\theta]} for all {0 < \theta < 1/2}. Remarkably, the threshold {1/2} is also the limit of what one can establish if one directly invokes the generalised Riemann hypothesis, so the Bombieri-Vinogradov theorem is often referred to as an assertion that the generalised Riemann hypothesis (or at least the Siegel-Walfisz theorem) holds “on the average”, which is often good enough for sieve-theoretic purposes.

We may replace the von Mangoldt function {\Lambda(n)} with the slight variant {\theta(n)}, defined to equal {\log p} when {n} is a prime {p} and zero otherwise. Using this replacement, as well as the prime number theorem (with{O(x / \log^A x)} error term), it is not difficult to show that {EH[\theta]} is equivalent to the estimate

\displaystyle  \sum_{1 \leq q \leq x^\theta} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \theta(n) - \frac{x}{\phi(q)}| = O( \frac{x}{\log^A x} ). \ \ \ \ \ (8)

Now we establish Theorem 4. Suppose that {EH[\theta]} holds for some fixed {1/2 <\theta < 1}, let {k_0} be sufficiently large depending on {\theta} but otherwise fixed, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. We would like to show that there are infinitely many {n} such that {n + {\mathcal H}} contains at least two primes. We will begin with the {W}-trick, restricting {n} to a residue class {b \hbox{ mod } W} with {b \in C(W)} (note that {C(W)} is non-empty because {{\mathcal H}} is admissible).

The general strategy will be as follows. We will introduce a weight function {\nu: {\bf Z} \rightarrow {\bf R}^+} that obeys the upper bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \nu(n) \leq (\alpha+o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0} R} \ \ \ \ \ (9)

and lower bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \theta(n+h) \nu(n)

\displaystyle  \geq (\beta-o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0-1} R} \ \ \ \ \ (10)

for all {h \in {\mathcal H}} and some fixed {\alpha,\beta > 0}, where {R} is a fixed power of {x} (we will eventually take {R = x^{\theta/2}}). (The factors of {W, \phi(W)}, {x}, and {\log R} on the right-hand side are natural normalisations coming from sieve theory and the reader should not pay too much attention to them.) Informally, (9) says that {\nu} has some normalised density at most {\alpha}, and then (10) roughly speaking asserts that relative to the weight {\nu}, {n+h} has a probability of at least {\beta/\alpha -o(1)} of being prime. If we sum (10) for all {h \in H} and then subtract off {\log 3x} copies of (9), we conclude that

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} (\sum_{h \in {\mathcal H}} \theta(n+h) - \log 3x) \nu(n)

\displaystyle  \geq (k_0\beta - \alpha \frac{\log x}{\log R}-o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0-1} R}.

In particular, if we have the crucial inequality

\displaystyle  k_0 \beta > \alpha \frac{\log x}{\log R} \ \ \ \ \ (11)

we conclude that

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} (\sum_{h \in {\mathcal H}} \theta(n+h) - \log 3x) \nu(n) \gg \frac{x}{W \log^{k_0-1} x}

and so {\sum_{h \in {\mathcal H}} \theta(n+h) - \log 3x} is positive for at least one value of {n} between {x} and {2x}. This can only occur if {n+{\mathcal H}} contains two or more primes. Thus we must have {n+{\mathcal H}} containing at least two primes for some {n} between {x} and {2x}; sending {x} off to infinity then gives {DHL[k_0,2]} as desired.

It thus suffices to find a weight function {\nu} obeying the required properties (9), (10) with parameters {\alpha,\beta,R} obeying the key inequality (11). It is thus of interest to make {R} as large a power of {x} as possible, and to minimise the ratio between {\alpha} and {\beta}. It is in the former task that the Elliott-Halberstam hypothesis will be crucial.

The key is to find a good choice of {\nu}, and the selection of this weight is arguably the main contribution of Goldston, Pintz, and Yildirim, who use a carefully modified version of the Selberg sieve. Following (a slight modification of) the Goldston-Pintz-Yildirim argument, we will take a weight of the form {\nu(n) = \lambda(n)^2}, where

\displaystyle  \lambda(n) := \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) g( \frac{\log d}{\log R} ) \ \ \ \ \ (12)

where {g: {\bf R} \rightarrow {\bf R}} is a smooth non-negative function supported on {[-1,1]} to be chosen later, {I := (w,+\infty)}, and {P(n)} is the polynomial

\displaystyle  P(n) := \prod_{h \in {\mathcal H}} (n+h).

The intuition here is that {\lambda} is a truncated approximation to a function of the form

\displaystyle  \lambda_a(n) := \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) \log^a \frac{n}{d}

for some natural number {a}, which one can check is only non-vanishing when {P(n)} has at most {a} distinct prime factors in {I}. So {\nu(n)} is concentrated on those numbers {n} for which {n+h} already has few prime factors for {h \in {\mathcal H}}, which will assist in making the ratio {\alpha/\beta} as small as possible.

Clearly {\nu} is non-negative. Now we consider the task of estimating the left-hand side of (9). Expanding out {\nu = \lambda^2} using (12) and interchanging summations, we can expand this expression as

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} 1.

The constraint {d_1,d_2 | P(n)} is equivalent to requiring that for each prime {p} dividing {[d_1,d_2]}, {n} lies in one of the residue classes {h_i \hbox{ mod } p} for {i=1,\ldots,k_0}. By choice of {I}, {p > w}, so all the {h_i} are distinct, and so we are constraining {n} to lie in one of {k_0} residue classes modulo {p} for each {p|[d_1,d_2]}; together with the constraint {n = b \hbox{ mod } W} and the Chinese remainder theorem, we are thus constraining {n} to {k_0^{\Omega([d_1,d_2])}} residue classes modulo {W [d_1,d_2]}, where {\Omega([d_1,d_2])} is the number of prime factors of {[d_1,d_2]}. We thus have

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} 1 = k_0^{\Omega([d_1,d_2])} \frac{x}{W[d_1,d_2]} + O( k_0^{\Omega([d_1,d_2])} ).

Note from the support of {g} that {d_1,d_2} may be constrained to be at most {R}, so that {d_1d_2} is at most {R^2}. We can thus express the left-hand side of (9) as the main term

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) k_0^{\Omega([d_1,d_2])} \frac{x}{W[d_1,d_2]}

plus an error

\displaystyle  O( R^2 \sum_{d_1,d_2 \in {\mathcal S}_I} g(\frac{\log d_1}{\log R}) g(\frac{\log d_2}{\log R}) \frac{k_0^{\Omega(d_1)} k_0^{\Omega(d_2)} }{d_1 d_2} ).

By Proposition 10, the error term is {O( R^2 \log^{2k_0} R )}. So if we set

\displaystyle  R := x^{\theta/2}

then the error term will certainly give a negligible contribution to (9) with plenty of room to spare. (But when we come to the more difficult sum (10), we will have much less room – only a superlogarithmic amount of room, in fact.) To show (9), it thus suffices to show that

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]}

\displaystyle \leq (\alpha+o(1)) (\frac{W}{\phi(W)})^{k_0} \log^{-k_0} R.

But by Proposition 10 (applied to the {k_0}-dimensional multiplicative function {k_0^{\Omega([d_1,d_2])}}) and the support of {g}, this bound holds with {\alpha} equal to the quantity

\displaystyle  \alpha = \int_0^1 g^{(k_0)}(x)^2 \frac{x^{k_0-1}}{(k_0-1)!}\ dx.

Now we turn to (10). Fix {h \in {\mathcal H}}. Repeating the arguments for (9), we may expand the left-hand side of (9) as

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R})

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} \theta(n+h).

Now we consider the inner sum

\displaystyle  \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} \theta(n+h).

As discussed earlier, the conditions {n=b \hbox{ mod } W} and {d_1,d_2 | P(n)} split into {k_0^{\Omega([d_1,d_2])}} residue classes {n = a \hbox{ mod } W [d_1,d_2]}. However, if {n = -h \hbox{ mod } p} for one of the primes {p} dividing {[d_1,d_2]}, then {\theta(n+h)} must vanish (since {R = x^\theta} is much less than {n+h}). So there are actually only {(k_0-1)^{\Omega([d_1,d_2])}} residue classes {a \hbox{ mod } W[d_1,d_2]} for which {a+h} is coprime to {W[d_1,d_2]}. We thus have

\displaystyle  \sum_{x \leq n \leq 2x: n = a \hbox{ mod } W[d_1,d_2]} \theta(n+h) = (k_0-1)^{\Omega([d_1,d_2])} \frac{x}{\phi(W[d_1,d_2])}

\displaystyle  + O( (k_0-1)^{\Omega([d_1,d_2])} E(x; W[d_1,d_2]) )

where {E(x;q)} denotes the quantity

\displaystyle  E(x;q) := \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{x \leq n \leq 2x: n = a \hbox{ mod } q} \theta(n) - \frac{x}{\phi(q)}|. \ \ \ \ \ (13)

Remark 1 There is an inefficiency here; the supremum in (13) is over all primitive residue classes {a \hbox{ mod } q}, but actually one only needs to take the supremum over the {(k_0-1)^{\Omega(q)}} residue classes {a \hbox{ mod } q} for which {P_h(a) = 0 \hbox{ mod } q}, where {P_h(a) := \prod_{h' \in {\mathcal H}:h' \neq h} (a+h'-h)}. This inefficiency is not exploitable if we insist on using the Elliott-Halberstam conjecture as the starting hypothesis, but will be used in the arguments of the next section in which a more lightweight hypothesis is utilised.

The left-hand side of (10) is thus equal to the main term

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} \frac{x}{\phi(W[d_1,d_2])}

plus an error term

\displaystyle  O( \sum_{d_1,d_2 \in {\mathcal S}_I} g(\frac{\log d_1}{\log R}) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} E(x; W[d_1,d_2]) ).

We first deal with the error term. Since {[d_1,d_2]} is in {{\mathcal S}_I} and is bounded by {R^2} on the support of this function, and each {d \in {\mathcal S}_I} has {3^{\Omega(d)}} representations of the form {d = [d_1,d_2]}, we can bound this expression by

\displaystyle  O( \sum_{d \in {\mathcal S}_I: d \leq R^2} 3^{\Omega(d)} (k_0-1)^{\Omega(d)} E(x;Wd) ).

On the other hand, from Proposition 10 and the trivial bound {E(x;Wd) = O( \frac{x \log x}{Wd} + \frac{x}{\phi(W) \phi(d)} )} we have

\displaystyle  \sum_{d \in {\mathcal S}_I: d \leq R^2} 3^{2\Omega(d)} (k_0-1)^{2\Omega(d)} E(x;Wd) = O( x \log^{O(1)} x )

while from (8) (and here we crucially use the choice {R = x^{\theta/2}} of {R}) one easily verifies that

\displaystyle  \sum_{d \in {\mathcal S}_I: d \leq R^2} E(x;Wd) = O( x \log^{-A} x )

for any fixed {A}. By the Cauchy-Schwarz inequality we see that the error term to (10) is negligible (assuming {w} sufficiently slowly growing of course). Meanwhile, the main term can be rewritten as

\displaystyle  \frac{x}{\phi(W)} \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \frac{f([d_1,d_2])}{[d_1,d_2]}

where {f} is the {k_0-1}-dimensional multiplicative function

\displaystyle  f(d) := \prod_{p|d} (k_0-1) \frac{p}{p-1}.

Applying Proposition 10, we obtain (10) with

\displaystyle  \beta = \int_0^1 g^{(k_0-1)}(x)^2 \frac{x^{k_0-2}}{(k_0-2)!}\ dx.

To obtain the crucial inequality (11), we thus need to locate a smooth non-negative function supported on {[-1,1]} obeying the inequality

\displaystyle  k_0 \int_0^1 g^{(k_0-1)}(x)^2 \frac{x^{k_0-2}}{(k_0-2)!}\ dx > \frac{2}{\theta} \int_0^1 g^{(k_0)}(x)^2 \frac{x^{k_0-1}}{(k_0-1)!}\ dx . \ \ \ \ \ (14)

In principle one can use calculus of variations to optimise the choice of {g} here (it will be the ground state of a certain one-dimensional Schrödinger operator), but one can already get a fairly good result here by a specific choice of {g} that is amenable for computations, namely a polynomial of the form {g(x) := \frac{1}{(k_0+l_0)!} (1-x)^{k_0+l_0}} for {x \in [0,1]} and some integer {l_0>0}, with {g} vanishing for {x>1} and smoothly truncated to {[-1,1]} somehow at negative values of {x}. Strictly speaking, this {g} is not admissible here because it is not infinitely smooth at {1}, being only {k_0+l_0-1} times continuously differentiable instead, but one can regularise this function to be smooth without significantly affecting either side of (14), so we will go ahead and test (14) with this function and leave the regularisation details to the reader. The inequality then becomes (after cancelling some factors)

\displaystyle  k_0 \int_0^1 (1-x)^{2l_0+2} \frac{x^{k_0-2}}{(k_0-2)!}\ dx > \frac{2}{\theta} \int_0^1 (l_0+1)^2 (1-x)^{2l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx .

Using the Beta function identity

\displaystyle  \int_0^1 (1-x)^a x^b\ dx = \frac{a! b!}{(a+b+1)!}

we have

\displaystyle  \alpha = \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} \ \ \ \ \ (15)

and

\displaystyle  \beta = \frac{(2l_0+2)!}{((l_0+1)!)^2 (k_0+2l_0+1)!} \ \ \ \ \ (16)

and the preceding equation now becomes

\displaystyle  k_0 \frac{(2l_0+2)!}{(2l_0+k_0+1)!} > \frac{2}{\theta} (l_0+1)^2 \frac{(2l_0)!}{(2l_0+k_0)!}

which simplifies to

\displaystyle  2\theta > (1 + \frac{1}{2l_0+1}) (1 + \frac{2l_0+1}{k_0}).

Actually, the same inequality is also applicable when {l_0} is real instead of integer, using Gamma functions in place of factorials; we leave the details to the interested reader. We can then optimise in {l_0} by setting {2l_0+1 = \sqrt{k_0}}, arriving at the inequality

\displaystyle  2\theta > (1 + \frac{1}{\sqrt{k_0}})^2.

But as long as {\theta > 1/2}, this inequality is satisfiable for any {k_0} larger than {(\sqrt{2\theta}-1)^{-2}}. This concludes the proof of Theorem 4.

Remark 2 One can obtain slightly better dependencies of {k_0} in terms of {\theta} by using more general functions for {g} than the monomials {\frac{1}{(k_0+l_0)!} (1-x)^{k_0+l_0}}, for instance one can take linear combinations of such functions. See the paper of Goldston, Pintz, and Yildirim for details. Unfortunately, as noted in this survey of Soundararajan, one has the general inequality

\displaystyle  k_0 \int_0^1 g^{(k_0-1)}(x)^2 \frac{x^{k_0-2}}{(k_0-2)!}\ dx \leq 4 \int_0^1 g^{(k_0)}(x)^2 \frac{x^{k_0-1}}{(k_0-1)!}\ dx \ \ \ \ \ (17)

which defeats any attempt to directly use this method using only the Bombieri-Vinogradov result that {EH[\theta]} holds for all {\theta < 1/2}. We show (17) in the case when {k_0} is large. Write {f(x) := g^{(k_0-1)}(x) x^{k_0/2-1}}, then (17) simplifies to

\displaystyle  \frac{k_0 (k_0-1)}{4} \int_0^1 f(x)^2\ dx \leq \int_0^1 (f'(x) - (k_0/2-1) x^{-1} f(x))^2 x\ dx.

The right-hand side simplifies after some integration by parts to

\displaystyle  \int_0^1 f'(x)^2 x + \frac{(k_0-2)^2}{4} f(x)^2 x^{-1}\ dx.

Subtracting off {\int_0^1 \frac{(k_0-2)^2}{4} f(x)^2\ dx} from both sides, one is left with

\displaystyle  \frac{3k_0-4}{4} \int_0^1 f(x)^2 \leq \int_0^1 f'(x)^2 x + \frac{(k_0-2)^2}{4} f(x)^2 (x^{-1} - 1)\ dx.

From the fundamental theorem of calculus and Cauchy-Schwarz, one has the bound

\displaystyle  |f(y)|^2 \leq (\int_0^1 f'(y)^2 y\ dy) (\log(1/y)).

Using this bound for {y} close to {1} and dominating {\frac{3k_0-4}{4}} by {\frac{(k_0-2)^2}{4} (y^{-1} - 1)} for {y} far from {1}, we obtain the claim (at least if {k_0} is large enough). There is some slack in this argument; it would be of interest to calculate exactly what the best constants are in (17), so that one can obtain the optimal relationship between {\theta} and {k_0}.

To get around this obstruction (17) in the uncondiional setting when one only has {EH[\theta]} for {\theta<1/2}, Goldston, Pintz, and Yildirim also considered sums of the form {\sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \theta(n+h) \nu(n)} in which {h} was now outside (but close to) {{\mathcal H}}. While the bounds here were significantly inferior to those in (10), they were still sufficient to prove a variant of the inequality (11) to get reasonably small gaps between primes.

— 4. The Motohashi-Pintz-Zhang theorem —

We now modify the above argument to give Theorem 5. Our treatment here is different from that of Zhang in that it employs the method of Buchstab iteration; a related argument also appears in the paper of Motohashi and Pintz. This arrangement of the argument leads to a more efficient dependence of {k_0} on {\varpi} than in the paper of Zhang. (The argument of Motohashi and Pintz is a bit more complicated and uses a slightly different formulation of the base conjecture {MPZ[\varpi]}, but the final bounds are similar to those given here, albeit with non-explicit constants in the {O()} notation.)

The main idea here is to truncate the interval {I} of relevant primes from {(w,\infty)} to {(w,x^\varpi)} for some small {\varpi}. Somewhat remarkably, it turns out that this apparently severe truncation does not affect the sums (9), (10) here as long as {k_0 \varpi} is large (which is going to be the case in practice, with {k_0} being comparable to {\varpi^{-2}}). The intuition is that {\nu} was already concentrated on those {n} for which {P(n)} had about {O(k_0)} factors, and it is too “expensive” for one of these factors to as large as {x^\varpi} or more, as it forces many of the other factors to be smaller than they “want” to be. The advantage of truncating the set of primes this way is that the version of the Elliott-Halberstam conjecture needed also acquires the same truncation, which gives that version a certain “well-factored” form (in the spirit of the work of Bombieri, Fouvry, Friedlander, and Iwaniec) which is essential in being able to establish that conjecture unconditionally for some suitably small {\varpi}.

To make this more precise, we first formalise the conjecture {MPZ[\varpi]} for {0 < \varpi < 1/4} mentioned earlier.

Conjecture 12 ({MPZ[\varpi]}) Let {{\mathcal H}} be a fixed {k}-tuple (not necessarily admissible) for some fixed {k \geq 2}, and let {b \hbox{ mod } W} be a primitive residue class. Then

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sum_{a \in ({\bf Z}/q{\bf Z})^\times: P(a) = 0 \hbox{ mod } q} |\Delta_{b,W}(\theta; q,a)| = O( x \log^{-A} x)

for any fixed {A>0}, where {I = (w,x^{\varpi})} and {\Delta_{b,W}(\theta;q,a)} is the quantity

\displaystyle  \Delta_{b,W}(\theta;q,a) := | \sum_{x \leq n \leq 2x: n=b \hbox{ mod } W; n = a \hbox{ mod } q} \theta(n) \ \ \ \ \ (18)

\displaystyle  - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \theta(n)|.

and

\displaystyle  P(a) := \prod_{h \in {\mathcal H}} (a+h).

This is the {W}-tricked formulation of the conjecture as (implicitly) stated in Zhang’s paper, which did not have the restriction {n = b \hbox{ mod } W} present (and with the interval {I} enlarged from {(w,x^\varpi)} to {(1,x^\varpi)}, and {{\mathcal H} \cup \{0\}} was required to be admissible). However the two formulations are morally equivalent (and Zhang’s arguments establish Theorem 6 with {MPZ[\varpi]} as stated). From the prime number theorem in arithmetic progressions (with {O( x \log^{-A} x)} error term) together with Proposition 10 we observe that we may replace (18) here by the slight variant

\displaystyle  \Delta'_{b,W}(\theta;q,a) := | \sum_{x \leq n \leq 2x: n=b \hbox{ mod } W; n = a \hbox{ mod } q} \theta(n) \ \ \ \ \ (19)

\displaystyle  - \frac{1}{\phi(Wq)} x|

without affecting the truth of {MPZ[\varpi]}.

It is also not difficult to deduce {MPZ[\varpi]} from {EH[1/2 + 2 \varpi]} after using a Cauchy-Schwarz argument to dispose of the {k^{\Omega(d)}} residue classes {a} in the above sum (cf. the treatment of the error term in (10) in the previous section); we leave the details to the interested reader. Note however that whilst {EH[1/2+2\varpi]} demands control over all primitive residue classes {a} in a given modulus {q}, the conjecture {MPZ[\varpi]} only requires control of a much smaller number of residue classes (roughly polylogarithmic in number, on average). Thus {MPZ[\varpi]} is simpler than {EH[1/2+2\varpi]}, though it is still far from trivial.

We now begin the proof of Theorem 5. Let {0 < \varpi < 1/4} be such that {MPZ[\varpi]} holds, and let {k_0} be a sufficiently large quantity depending on {\varpi} but which is otherwise fixed. As before, it suffices to locate a non-negative sieve weight {\nu} that obeys the estimates (9), (10) for parameters {\alpha,\beta,R} that obey the key inequality (11), and with {g} smooth and supported on {[-1,1]}. The choice of weight {\nu} is almost the same as before; it is also given as a square {\nu(n) = \lambda(n)^2} with {\lambda} given by (12), but now the interval {I} is truncated to {(w,x^\varpi)} instead of {(x,\infty)}. Also, in this argument we take

\displaystyle  R = x^{1/4 + \varpi}

We now establish (9). By repeating the previous arguments, the left-hand side of (9) is equal to a main term

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) k_0^{\Omega([d_1,d_2])} \frac{x}{W[d_1,d_2]} \ \ \ \ \ (20)

plus an error term which continues to be acceptable (indeed, the error term is slightly smaller than in the previous case due to the truncated nature of {I}). At this point in the previous section we applied Proposition 10, but that proposition was only available for the untruncated interval {[w,+\infty)} instead of the truncated interval {[w,x^\varpi)}. One could try to adapt the proof of that proposition to the truncated case, but then one is faced with the problem of controlling the truncated zeta function {\zeta_I}. While one can eventually get some reasonable asymptotics for this function, it seems to be more efficient to eschew Fourier analysis and work entirely in “physical space” by the following partial Möbius inversion argument. Write {J := [x^\varpi,\infty)}, thus {I \cup J = [w,+\infty)}. Observe that for any {d \in {\mathcal S}_{I \cup J}}, the quantity {\sum_{a \in {\mathcal S}_J: a|d} \mu(a)} equals {1} when {d} lies in {{\mathcal S}_I} and vanishes otherwise. Hence, for any function {F(d_1,d_2)} of {d_1} and {d_2} supported on squarefree numbers we have the partial Mobius inversion formula

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} F(d_1,d_2) = \sum_{a_1, a_2 \in {\mathcal S}_J} \mu(a_1) \mu(a_2) \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} F(a_1 d_1, a_2 d_2)

and so the main term (20) can be expressed as

\displaystyle  \sum_{a_1, a_2 \in {\mathcal S}_J} \mu(a_1) \mu(a_2) \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} \mu(a_1 d_1) g(\frac{\log a_1d_1}{\log R}) \mu(a_2 d_2) g(\frac{\log a_2 d_2}{\log R}) \ \ \ \ \ (21)

\displaystyle  k_0^{\Omega([a_1d_1,a_2d_2])} \frac{x}{W [a_1d_1,a_2d_2]}.

We first dispose of the contribution to (21) when {a_i,d_j} share a common prime factor {p_* \in J} for some {i,j=1,2}. For any fixed {i,j}, we can bound this contribution by

\displaystyle  \ll \frac{x}{W} \sum_{p_* \in J} \sum_{a_1,a_2 \in {\mathcal S}_J} \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} 1_{p^2_*|a_id_j} (a_1 d_1 a_2 d_2)^{-1/\log R} \frac{k_0^{\Omega([a_1d_1,a_2d_2])}}{[a_1d_1,a_2d_2]}.

Factorising the inner two sums as an Euler product, this becomes

\displaystyle  \ll \frac{x}{W} \sum_{p_* \in J} \frac{1}{p_*^2} ( \prod_{p \in I \cup J} 1 + O(\frac{1}{p^{1+1/\log R}}) ).

[UPDATE: The above argument is not quite correct; a corrected (and improved) version is given at this newer post.] The product is {O(\log^{O(1)} R)} by e.g. Mertens’ theorem, while {\sum_{p_* \in J} \frac{1}{p_*^2} \ll x^{-\varpi}}. So the contribution of this case is negligible.

If {a_i,d_j} do not share a common factor {p_* \in J} for any {i,j=1,2}, then we can factor {[a_1d_1,a_2d_2]} as {[a_1,a_2][d_1,d_2]}. Rearranging this portion of (21) and then reinserting the case when {a_i,d_j} have a common factor {p_* \in J} for some {i,j}, we may write (21) up to negligible errors as

\displaystyle  \frac{x}{W} \sum_{a_1, a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} \mu(d_1) g(\frac{\log d_1}{\log R} + \frac{\log a_1}{\log R})

\displaystyle  \mu(d_2) g(\frac{\log d_2}{\log R} + \frac{\log a_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]}.

Note that we can restrict {a_1,a_2} to be at most {R} as otherwise the {g} factors vanish. The inner sum

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} \mu(d_1) g(\frac{\log d_1}{\log R} + \frac{\log a_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R} + \frac{\log a_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]}

is now of the form that can be treated by Proposition 10, and takes the form

\displaystyle  (\frac{\phi(W)}{W} \log R)^{-k_0} (\int_0^\infty g^{(k_0)}(x + \frac{\log a_1}{R}) g^{(k_0)}(x + \frac{\log a_2}{R}) \frac{x^{k_0-1}}{(k_0-1)!}\ dx

\displaystyle  + o(1)).

Here we make the technical remark that the translates of {g} by shifts between {0} and {1} are uniformly controlled in smooth norms, which means that the {o(1)} error here is uniform in the choices of {a_1, a_2}.

Let us first deal with the contribution of the {o(1)} error term. This is bounded by

\displaystyle  o( \frac{x}{W} (\frac{\phi(W)}{W} \log R)^{-k_0} \sum_{a_1,a_2 \in {\mathcal S}_{(x^\varpi, R]}} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} ).

The inner sum factorises as

\displaystyle  \prod_{x^\varpi < p \leq R} (1 + \frac{3 k_0}{p})

which by Mertens’ theorem is {O(1)} (albeit with a rather large implied constant!), so this error is negligible for the purposes of (9). Indeed, (9) is now reduced to the inequality

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]}

\displaystyle  \int_0^\infty g^{(k_0)}(x + \frac{\log a_1}{\log R}) g^{(k_0)}(x + \frac{\log a_2}{\log R}) \frac{x^{k_0-1}}{(k_0-1)!}\ dx \ \ \ \ \ (22)

\displaystyle  \leq \alpha+o(1).

Note that the factor {\frac{x^{k_0-1}}{(k_0-1)!}} increases very rapidly with {x} when {k} is large, which basically means that any non-trivial shift of the {g^{(k_0)}} factors to the left by {\frac{\log a_1}{\log R}} or {\frac{\log a_2}{\log R}} will cause the integral in (22) to decrease dramatically. Since all the {a_1,a_2} in {{\mathcal J}} are either equal to {1} or bounded below by {x^\varpi}, this will cause the {a_1=a_2=1} term to dominate in the regime when {k_0 \varpi} is large (or more precisely {k_0 \varpi \gg \log k_0}), which is the case in applications.

At this point, in order to perform the computations cleanly, we will mimic the arguments from the previous section and take the explicit choice

\displaystyle  g(x) := \frac{1}{(k_0+l_0)!} (1-x)_+^{k_0+l_0}

for some integer {l_0>0} and {x>0} (and some smooth continuation to {[-1,1]} for negative {x}, and so

\displaystyle  g^{(k_0)}(x) = (-1)^{k_0} \frac{1}{l_0!} (1-x)^{l_0}_+

for positive {x}. (Again, this function is not quite smooth at {1}, but this issue can be dealt with by an infinitesimal regularisation argument which we omit here.) The left-hand side of (22) now becomes

\displaystyle  \frac{1}{(l_0!)^2} \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \int_0^\infty (1-x-\frac{\log a_1}{\log R})_+^{l_0} (1-x-\frac{\log a_2}{\log R})_+^{l_0}

\displaystyle \frac{x^{k_0-1}}{(k_0-1)!}\ dx.

The integral here is a little bit more complicated than a beta integral. To estimate it, we use the beta function identity to observe that

\displaystyle  \int_0^\infty (1-x-\frac{\log a_1}{\log R})_+^{2l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx = (1 - \frac{\log a_1}{\log R})_+^{k_0+2l_0} \frac{(2l_0)!}{(k_0+2l_0)!}

and

\displaystyle  \int_0^\infty (1-x-\frac{\log a_2}{\log R})_+^{2l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx = (1 - \frac{\log a_2}{\log R})_+^{k_0+2l_0} \frac{(2l_0)!}{(k_0+2l_0)!}

and hence by Cauchy-Schwarz

\displaystyle  \int_0^\infty (1-x-\frac{\log a_1}{\log R})_+^{l_0} (1-x-\frac{\log a_2}{\log R})_+^{l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx

\displaystyle \leq (1 - \frac{\log a_1}{\log R})_+^{k_0/2+l_0} (1 - \frac{\log a_2}{\log R})_+^{k_0/2+l_0} \frac{(2l_0)!}{(k_0+2l_0)!}.

This Cauchy-Schwarz step is a bit wasteful when {a_1,a_2} are far apart, but this does seems to only lead to a minor loss of efficiency in the estimates. We have thus bounded the left-hand side of (22) by

\displaystyle  \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]}

\displaystyle  (1 - \frac{\log a_1}{\log R})_+^{k_0/2+l_0} (1 - \frac{\log a_2}{\log R})_+^{k_0/2+l_0}.

It is now convenient to collapse the double summation to a single summation. We may bound

\displaystyle  (1 - \frac{\log a_1}{\log R})_+^{k_0/2+l_0} (1 - \frac{\log a_2}{\log R})_+^{k_0/2+l_0} \leq (1 - \frac{\log [a_1,a_2]}{\log R^2})_+^{k_0/2+l_0}

(since {\frac{\log [a_1,a_2]}{\log R^2}} is less than the greater of {\frac{\log a_1}{\log R}} and {\frac{\log a_2}{\log R}}) and observe that each {a \in {\mathcal S}_J} has {3^{\Omega(a)}} representations of the form {a = [a_1,a_2]}, so we may now bound the left-hand side of (22) by

\displaystyle  \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} \sum_{a \in {\mathcal S}_J} \frac{(3k_0)^{\Omega(a)}}{a} (1 - \frac{\log a}{\log R^2})_+^{k_0/2+l_0}.

Note that an element {a} of {{\mathcal S}_J} is either equal to {1}, or lies in the interval {[x^{n\varpi}, x^{(n+1)\varpi})} for some natural number {n \geq 1}. In the latter case, we have

\displaystyle  (1 - \frac{\log a}{\log R^2})_+ \leq (1 - \frac{2n \varpi}{1 + 4\varpi})_+.

In particular, this expression vanishes if {n \geq 2 + \frac{1}{2\varpi}}. We can thus bound the left-hand side of (22) by

\displaystyle  \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} (1 + \sum_{1 \leq n < 2 + \frac{1}{2\varpi}} (1 - \frac{2n \varpi}{1 + 4\varpi})^{k_0/2 + l_0}

\displaystyle \sum_{a \in {\mathcal S}_J: a < x^{(n+1)\varpi}} \frac{(3k_0)^{\Omega(a)}}{a} ).

If we introduce the quantity

\displaystyle  \Phi_{3k_0}(z,y) := \sum_{j=0}^\infty (3k_0)^j \sum_{y \leq p_1 < \ldots < p_j: p_1 \ldots p_j < z} \frac{1}{p_1 \ldots p_j} \ \ \ \ \ (23)

then we have thus bounded the left-hand side of (22) by

\displaystyle  \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} (1 + \sum_{j=1}^\infty (1 - \frac{2j \varpi}{1 + 4\varpi})_+^{k_0/2 + l_0} \Phi_{3k_0}( x^{(j+1)\varpi}, x^\varpi )).

We observe that

\displaystyle  \Phi_{3k_0}(z,y) = 1 \ \ \ \ \ (24)

when {y \geq z}, while in general we have the Buchstab identity

\displaystyle  \Phi_{3k_0}(z,y) \leq 1 + 3k_0 \sum_{y \leq p < z} \frac{1}{p} \Phi(\frac{z}{p}, p) \ \ \ \ \ (25)

as can be seen by isolating the smallest prime {p_1} in all the terms in (23) with {j \geq 1}. (This inequality is very close to being an identity, the only loss coming from the possibility of the prime factor {p} being repeated in a term associated to {\frac{1}{p} \Phi(\frac{z}{p},p)}.) We can iterate this identity to obtain the following conclusion:

Lemma 13 For any {n \geq 1}, we have

\displaystyle  \Phi_{3k_0}(z,y) \leq \prod_{j=1}^{n-1} (1 + 3k_0 \log(1+\frac{1}{j})) + o(1)

whenever {z \leq y^n} and {y \geq x^\varpi} for some fixed {\varpi > 0}, with the error term being uniform in the choice of {z,y}.

Proof: Write {A_n := \prod_{j=1}^{n-1} (1 + 3k_0 \log(1+\frac{1}{j}))}. We prove the bound {\Phi_{3k_0}(z,y) \leq A_n + o(1)} by strong induction on {n}. The case {n=1} follows from (24). Now suppose that {n>1} and that the claim has already been proven for smaller {n}. Let {z \leq y^n} and {y > x^\varpi}. Note that {\frac{z}{p} \leq p^j} whenever {p \geq z^{\frac{1}{j+1}}}. We thus have from (25) and the induction hypothesis that

\displaystyle  \Phi_{3k_0}(z,y) \leq 1 + 3k_0 \sum_{j=1}^{n-1} \sum_{z^{\frac{1}{j+1}} \leq p < z^{\frac{1}{j}}} \frac{1}{p} (A_j + o(1) );

applying Mertens’ theorem (or the prime number theorem) we have

\displaystyle  \sum_{z^{\frac{1}{j+1}} \leq p < z^{\frac{1}{j}}} \frac{1}{p} ( A_j + o(1) ) = A_j \log(1 + \frac{1}{j}) + o(1)

and the claim follows from the telescoping identity

\displaystyle  A_n = 1 + 3k_0 \sum_{j=1}^{n-1} A_j \log(1+\frac{1}{j}).

\Box

Applying this inequality, we have established (22) with

\displaystyle  \alpha := \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} (1 + \kappa) \ \ \ \ \ (26)

where

\displaystyle  \kappa := \sum_{1 \leq n < 2 + \frac{1}{2\varpi}} (1 - \frac{2n \varpi}{1 + 4\varpi})^{k_0/2 + l_0} \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ). \ \ \ \ \ (27)

We remark that as a first approximation we have

\displaystyle  \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ) \approx \frac{(3k_0)^{n}}{n!}

and

\displaystyle  (1 - \frac{2n \varpi}{1 + 4\varpi})^{k_0/2 + l_0} \approx \exp( - n k_0 \varpi )

so in the regime {k_0 \varpi \gg \log k_0}, {\kappa} is roughly {3k_0 \exp( - k_0 \varpi )}, which will be negligible for the parameter ranges of {k_0,\varpi} of interest. Thus the {\alpha} in this argument is quite close to the {\alpha} from (15) in practice.

Now we turn to (10). Fix {h \in {\mathcal H}}. As in the previous section, we can bound the left-hand side of (10) as the sum of the main term

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} \frac{x}{\phi(W[d_1,d_2])}

plus an error term

\displaystyle  O( \sum_{d_1,d_2 \in {\mathcal S}_I} g(\frac{\log d_1}{\log R}) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} E'(x; [d_1,d_2]) )

where {E'(x;q)} is the quantity

\displaystyle  E'(x;q) := \sum_{a \in ({\bf Z}/q{\bf Z})^\times: P_h(a) = 0 \hbox{ mod } q} |\Delta'_{b,W}(\theta; q,a)|,

{P_h} is the polynomial {P_h(a) := \prod_{h' \in {\mathcal H} \backslash \{h\}} (n+h'-h)}, and {\Delta'_{b,W}} was defined in (19). Using the hypothesis {MPZ[\varpi]} and Cauchy-Schwarz as in the previous section we see that the error term is negligible for the purposes of establishing (10). As for the main term, the same argument used to reduce (9) to (22) shows that (10) reduces to

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{(k_0-1)^{\Omega([a_1,a_2])}}{\phi([a_1,a_2])} \int_0^\infty g^{(k_0-1)}(x + \frac{\log a_1}{\log R}) g^{(k_0-1)}(x + \frac{\log a_2}{\log R}) \frac{x^{k_0-2}}{(k_0-2)!}\ dx

\displaystyle \geq \beta-o(1).

Here, we can do something a bit crude; with our choice of {g}, the integrand is non-negative, so we can simply discard all but the {a_1=a_2=1} term and reduce to

\displaystyle  \int_0^\infty g^{(k_0-1)}(x) g^{(k_0-1)}(x) \frac{x^{k_0-2}}{(k_0-2)!}\ dx \geq \beta

(The intuition here is that by refusing to sieve using primes larger than {x^\varpi}, we have enlarged the sieve {\nu}, which makes the upper bound (9) more difficult but the lower bound (10) actually becomes easier.) So we can take the same choice (16) of {\beta} as in the previous section:

\displaystyle  \beta := \frac{(2l_0+2)!}{((l_0+1)!)^2 (k_0+2l_0+1)!}.

Inserting this and (26) into (11) and simplifying, we see that we can obtain {DHL[k_0,2]} once we can verify the inequality

\displaystyle  1+4\varpi > (1 + \frac{1}{2l_0+1}) (1 + \frac{2l_0+1}{k_0}) (1+\kappa).

As before, {l_0} can be taken to be non-integer if desired. Setting {k_0} to be slightly larger than {(\sqrt{1+4\varpi}-1)^{-2} \approx (2\varpi)^{-2}} we obtain Theorem 5.

— 5. Using optimal values of {g} (NEW, June 5, 2013) —

We can do better than given above by using an optimal value of {g}. The following result was obtained recently by Farkas, Pintz, and Revesz, and independently worked out by commenters on this blog:

Theorem 14 (Optimal GPY weight) Let {k_0 > 2} be an integer. Then the ratio

\displaystyle  \frac{\int_0^1 f'(x)^2 x^{k_0-1}\ dx}{\int_0^1 f(x)^2 x^{k_0-2}\ dx}

where {f: [0,1] \rightarrow {\bf R}} is a smooth function with {f(1)=0} that is not identically vanishing, has a minimal value of

\displaystyle  \lambda := \frac{j_{k_0-2}^2}{4}

where {j_{k_0-2}} is the first zero of the Bessel function {J_{k_0-2}}. Furthermore, this minimum is attained if (and only if) {f} is a scalar multiple of the function

\displaystyle  f_0(x) = x^{1-k_0/2} J_{k_0-2}(2\sqrt{x\lambda}).

Proof: The function {J_{k_0-2}}, by definition, obeys the Bessel differential equation

\displaystyle  x^2 \frac{d}{dx^2} J_{k_0-2} + x \frac{d}{dx} J_{k_0-2} + (x^2 - (k_0-2)^2) J_{k_0-2} = 0

and also vanishes to order {k_0-2} at the origin. From this and routine computations it is easy to see that {f_0} is smooth, strictly positive on {[0,1)}, and obeys the differential equation

\displaystyle  \frac{d}{dx} (x^{k_0-1} \frac{d}{dx} f_0(x)) + \lambda x_0^{k-2} f_0(x) = 0. \ \ \ \ \ (28)

If we write {g_0(x) := \frac{f'_0}{f_0}(x)}, which is well-defined away from {1} since {f_0} is non-vanishing on {[0,1)}, then {g_0} obeys the Ricatti-type equation

\displaystyle  (k_0-1) g_0(x) + x g'_0(x) + x g_0(x)^2 + \lambda = 0. \ \ \ \ \ (29)

Now consider the quadratic form

\displaystyle  Q( f ) := \int_0^1 f'(x)^2 x^{k_0-1}\ dx - \lambda \int_0^1 f(x)^2 x^{k_0-2}\ dx

for smooth functions {f: [0,1] \rightarrow {\bf R}} with {f(1)=0}. A calculation using (29) and integration by parts shows that

\displaystyle  \int_0^1 (f'(x)-g_0(x)f(x))^2 x^{k_0-1}\ dx = Q(f)

and so {Q(f) \geq 0}, giving the first claim; the second claim follows by noting that {f'-g_0 f} vanishes if and only if {f} is a scalar multiple of {f_0}. (Note that the integration by parts is a little subtle, because {f_0} vanishes to first order at {x=1} and so {g_0} blows up to first order; but {g_0 f^2} still vanishes to first order at {x=1}, allowing one to justify the integration by parts by a standard limiting argument.) \Box

If we now test (14) with a function {g: [0,1] \rightarrow {\bf R}} which is smooth, vanishes to order {k_0} at {x=1}, and has a {(k_0-1)^{th}} derivative equal to {f_0}, we see that we can deduce {DHL[k_0,2]} from {EH[\theta]} whenever

\displaystyle  2\theta > \frac{j_{k_0-2}^2}{k_0(k_0-1)}.

Using the known asymptotic

\displaystyle  j_n = n + c n^{1/3} + O( n^{-1/3} )

for {c := 1.8557571\ldots} and large {n} (see e.g. Abramowitz and Stegun), this is asymptotically of the form

\displaystyle  2 \theta > 1 + 2c k_0^{-2/3} + O( k_0^{-1} )

or

\displaystyle  k_0 > (2c (2\theta-1))^{-3/2},

thus giving a relationship of the form {k_0 \sim (\theta-1/2)^{-3/2}} that is superior to the previous relationship {k_0 \sim (\theta-1/2)^{-2}}.

A similar argument can be given for Theorem 5, using {g} of the form above rather than a monomial {\frac{(1-x)^{k_0+l_0}}{(k_0+l_0)!}} (and extended by zero to {[1,+\infty)}). For future optimisation we consider a generalisation {MPZ[\varpi,\delta]} of {MPZ[\varpi]} in which the interval {I} is of the form {[w,x^\delta)} rather than {[w,x^\varpi)}, so that {J} is now {[x^\delta,\infty)} rather than {[x^\varpi,\infty)}. As before, the key point is the estimation of {\alpha}. The arguments leading to (22) go through for any test function {g}, so we have to show

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \int_0^\infty g^{(k_0)}(x + \frac{\log a_1}{\log R}) g^{(k_0)}(x + \frac{\log a_2}{\log R}) \frac{x^{k_0-1}}{(k_0-1)!}\ dx \leq \alpha+o(1).

As {g} has {(k_0-1)^{th}} derivative equal to {f_0}, this is

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \int_0^{1-\frac{\log \max(a_1,a_2)}{\log R}} f'_0(x + \frac{\log a_1}{\log R}) f'_0(x + \frac{\log a_2}{\log R}) \ \ \ \ \ (30)

\displaystyle  \frac{x^{k_0-1}}{(k_0-1)!}\ dx \leq \alpha+o(1).

We need some sign information on {f'_0}:

Lemma 15 On {[0,1)}, {f_0} is positive, {f'_0} is negative and {f''_0} is positive.

Proof: From (28) we have

\displaystyle  x f''_0(x) + (k_0-1) f'_0(x) + \lambda f_0(x) = 0.

From construction we already know that {f_0} is positive on {[0,1)}. The above equation then shows that {f'_0} is negative at {x=0}, and that {f_0} cannot have any local minimum in {(0,1)}, so {f'_0} is negative throughout. To obtain the final claim {f''_0>0} we use an argument provided by Gergely Harcos in the comments: from the recursive relations for Bessel functions we can check that {f''_0} is a positive multiple of {x^{-k_0/2} J_{k_0}(2 \sqrt{x\lambda})}, and the claim then follows from the interlacing properties of the zeroes of Bessel functions. \Box

Write {A := \int_0^1 f_0(x)^2 \frac{x^{k_0-2}}{(k_0-2)!}}, so {A} is positive and by Theorem 14 we have

\displaystyle  \int_0^1 f'_0(x)^2 \frac{x^{k_0-1}}{(k_0-1)!} = \frac{j_{k_0-2}^2}{4(k_0-1)} A.

If {a_1 < R}, then as {f'_0} is negative and increasing we have

\displaystyle  -f'_0(x + \frac{\log a_1}{\log R}) \leq -f'_0( x / (1-\frac{\log a_1}{\log R}) )

for {0 \leq x \leq 1 - \frac{\log a_1}{\log R}}, and thus by change of variable

\displaystyle \int_0^{1-\frac{\log a_1}{\log R}} f'_0(x + \frac{\log a_1}{\log R})^2 \frac{x^{k_0-1}}{(k_0-1)!} \leq (1-\frac{\log a_1}{\log R})^{k_0} \frac{j_{k_0-2}^2}{4(k_0-1)} A

for {a_1 < R}, and thus

\displaystyle \int_0^{1-\frac{\log a_1}{\log R}} f'_0(x + \frac{\log a_1}{\log R})^2 \frac{x^{k_0-1}}{(k_0-1)!} \leq (1-\frac{\log a_1}{\log R})_+^{k_0} \frac{j_{k_0-2}^2}{4(k_0-1)} A

for all {a_1}. Similarly

\displaystyle \int_0^{1-\frac{\log a_2}{\log R}} f'_0(x + \frac{\log a_2}{\log R})^2 \frac{x^{k_0-1}}{(k_0-1)!} \leq (1-\frac{\log a_2}{\log R})^{k_0}_+ \frac{j_{k_0-2}^2}{4(k_0-1)} A

for all {a_2}. By Cauchy-Schwarz we can thus bound the integral in (30) by

\displaystyle  (1-\frac{\log a_1}{\log R})_+^{k_0/2} (1-\frac{\log a_2}{\log R})_+^{k_0/2} \frac{j_{k_0-2}^2}{4(k_0-1)} A

and so (30) reduces to

\displaystyle  \frac{j_{k_0-2}^2}{4(k_0-1)} A \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} (1-\frac{\log a_1}{\log R})_+^{k_0/2} (1-\frac{\log a_2}{\log R})_+^{k_0/2} \leq \alpha + o(1).

Repeating the arguments of the previous section, we can reduce this to

\displaystyle  \frac{j_{k_0-2}^2}{4(k_0-1)} A \sum_{a \in {\mathcal S}_J} \frac{(3k_0)^{\Omega(a)}}{a} (1-\frac{\log a}{\log R})_+^{k_0/2} \leq \alpha + o(1)

and by further continuing the arguments of the previous section we end up being able to take

\displaystyle  \alpha = \frac{j_{k_0-2}^2}{4(k_0-1)} A (1 + \kappa)

where

\displaystyle  \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{2\delta}} (1 - \frac{2n \delta}{1 + 4\varpi})^{k_0/2} \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ). \ \ \ \ \ (31)

Also, the previous arguments allow us to take

\displaystyle  \beta = A.

The key inequality (11) now becomes

\displaystyle  1 + 4\varpi > \frac{j_{k_0-2}^2}{k_0(k_0-1)} (1+\kappa), \ \ \ \ \ (32)

thus {MPZ[\varpi,\delta]} implies {DHL[k_0,2]} whenever (32) is obeyed with the value (31) of {\kappa}.


Filed under: expository, math.NT Tagged: Cem Yildirim, Dan Goldston, Elliott-Halberstam conjecture, Janos Pintz, polymath8, prime gaps, sieve theory, Yitang Zhang, Yoichi Motohashi

John PreskillWe are all Wilsonians now

Ken Wilson

Ken Wilson

Ken Wilson passed away on June 15 at age 77. He changed how we think about physics.

Renormalization theory, first formulated systematically by Freeman Dyson in 1949, cured the flaws of quantum electrodynamics and turned it into a precise computational tool. But the subject seemed magical and mysterious. Many physicists, Dirac prominently among them, questioned whether renormalization rests on a sound foundation.

Wilson changed that.

The renormalization group concept arose in an extraordinary paper by Gell-Mann and Low in 1954. It was embraced by Soviet physicists like Bogoliubov and Landau, and invoked by Landau to challenge the consistency of quantum electrodynamics. But it was an abstruse and inaccessible topic, as is well illustrated by the baffling discussion at the very end of the two-volume textbook by Bjorken and Drell.

Wilson changed that, too.

Ken Wilson turned renormalization upside down. Dyson and others had worried about the “ultraviolet divergences” occurring in Feynman diagrams. They introduced an artificial cutoff on integrations over the momenta of virtual particles, then tried to show that all the dependence on the cutoff can be eliminated by expressing the results of computations in terms of experimentally accessible quantities. It required great combinatoric agility to show this trick works in electrodynamics. In other theories, notably including general relativity, it doesn’t work.

Wilson adopted an alternative viewpoint. Take the short-distance cutoff seriously, he said, regarding it as part of the physical formulation of the field theory. Now ask what physics looks like at distances much larger than the cutoff. Wilson imagined letting the short-distance cutoff grow, while simultaneously adjusting the theory to preserve its low-energy predictions. This procedure sounds complicated, but Wilson discovered something wonderful — for the purpose of computing low-energy processes the theory becomes remarkably simple, completely characterized by just a few (renormalized) parameters. One recovers Dyson’s results plus much more, while also acquiring a rich and visually arresting physical picture of what is going on.

When I started graduate school in 1975, Wilson, not yet 40, was already a legend. Even Sidney Coleman, for me the paragon of razor sharp intellect, seemed to regard Wilson with awe. (They had been contemporaries at Caltech, both students of Murray Gell-Mann.) It enhanced the legend that Wilson had been notoriously slow to publish. He spent years pondering the foundations of quantum field theory before finally unleashing a torrent of revolutionary papers in the early 70s. Cornell had the wisdom to grant tenure despite Wilson’s unusually low productivity during the 60s.

As a student, I spent countless hours struggling through Wilson’s great papers, some of which were quite difficult. One introduced me to the operator product expansion, which became a workhorse of high-energy scattering theory and the foundation of conformal field theory. Another considered all the possible ways that renormalization group fixed points could control the high-energy behavior of the strong interactions. Conspicuously missing from the discussion was what turned out to be the correct idea — asymptotic freedom. Wilson had not overlooked this possibility; instead he “proved” it to be impossible. The proof contains a subtle error. Wilson analyzed charge renormalization invoking both Lorentz covariance and positivity of the Hilbert space metric, forgetting that gauge theories admit no gauge choice with both properties. Even Ken Wilson made mistakes.

Wilson also formulated the strong-coupling expansion of lattice gauge theory, and soon after pioneered the Euclidean Monte Carlo method for computing the quantitative non-perturbative predictions of quantum chromodynamics, which remains today an extremely active and successful program. But of the papers by Wilson I read while in graduate school, the most exciting by far was this one about the renormalization group. Toward the end of the paper Wilson discussed how to formulate the notion of the “continuum limit” of a field theory with a cutoff. Removing the short-distance cutoff is equivalent to taking the limit in which the correlation length (the inverse of the renormalized mass) is infinitely long compared to the cutoff — the continuum limit is a second-order phase transition. Wilson had finally found the right answer to the decades-old question, “What is quantum field theory?” And after reading his paper, I knew the answer, too! This Wilsonian viewpoint led to further deep insights mentioned in the paper, for example that an interacting self-coupled scalar field theory is unlikely to exist (i.e. have a continuum limit) in four spacetime dimensions.

Wilson’s mastery of quantum field theory led him to another crucial insight in the 1970s which has profoundly influenced physics in the decades since — he denigrated elementary scalar fields as unnatural. I learned about this powerful idea from an inspiring 1979 paper not by Wilson, but by Lenny Susskind. That paper includes a telltale acknowledgment: “I would like to thank K. Wilson for explaining the reasons why scalar fields require unnatural adjustments of bare constants.”

Susskind, channeling Wilson, clearly explains a glaring flaw in the standard model of particle physics — ensuring that the Higgs boson mass is much lighter than the Planck (i.e., cutoff) scale requires an exquisitely careful tuning of the theory’s bare parameters. Susskind proposed to banish the Higgs boson in favor of Technicolor, a new strong interaction responsible for breaking the electroweak gauge symmetry, an idea I found compelling at the time. Technicolor fell into disfavor because it turned out to be hard to build fully realistic models, but Wilson’s complaint about elementary scalars continued to drive the quest for new physics beyond the standard model, and in particular bolstered the hope that low-energy supersymmetry (which eases the fine tuning problem) will be discovered at the Large Hadron Collider. Both dark energy (another fine tuning problem) and the absence so far of new physics beyond the HIggs boson at the LHC are prompting some soul searching about whether naturalness is really a reliable criterion for evaluating success in physical theories. Could Wilson have steered us wrong?

Wilson’s great legacy is that we now regard nearly every quantum field theory as an effective field theory. We don’t demand or expect that the theory will continue working at arbitrarily short distances. At some stage it will break down and be replaced by a more fundamental description. This viewpoint is now so deeply ingrained in how we do physics that today’s students may be surprised to hear it was not always so. More than anyone else, we have Ken Wilson to thank for this indispensable wisdom. Few ideas have changed physics so much.


Terence TaoA truncated elementary Selberg sieve of Pintz

This post is a continuation of the previous post on sieve theory, which is an ongoing part of the Polymath8 project. As the previous post was getting somewhat full, we are rolling the thread over to the current post.

In this post we will record a new truncation of the elementary Selberg sieve discussed in this previous post (and also analysed in the context of bounded prime gaps by Graham-Goldston-Pintz-Yildirim and Motohashi-Pintz) that was recently worked out by Janos Pintz, who has kindly given permission to share this new idea with the Polymath8 project. This new sieve decouples the {\delta} parameter that was present in our previous analysis of Zhang’s argument into two parameters, a quantity {\delta} that used to measure smoothness in the modulus, but now measures a weaker notion of “dense divisibility” which is what is really needed in the Elliott-Halberstam type estimates, and a second quantity {\delta'} which still measures smoothness but is allowed to be substantially larger than {\delta}. Through this decoupling, it appears that the {\kappa} type losses in the sieve theoretic part of the argument can be almost completely eliminated (they basically decay exponential in {\delta'} and have only mild dependence on {\delta}, whereas the Elliott-Halberstam analyhsis is sensitive only to {\delta}, allowing one to set {\delta} far smaller than previously by keeping {\delta'} large). This should lead to noticeable gains in the {k_0} quantity in our analysis.

To describe this new truncation we need to review some notation. As in all previous posts (in particular, the first post in this series), we have an asymptotic parameter {x} going off to infinity, and all quantities here are implicitly understood to be allowed to depend on {x} (or to range in a set that depends on {x}) unless they are explicitly declared to be fixed. We use the usual asymptotic notation {O(), o(), \ll} relative to this parameter {x}. To be able to ignore local factors (such as the singular series {{\mathfrak G}}), we also use the “{W}-trick” (as discussed in the first post in this series): we introduce a parameter {w} that grows very slowly with {x}, and set {W := \prod_{p<w} p}.

For any fixed natural number {k_0}, define an admissible {k_0}-tuple to be a fixed tuple {{\mathcal H}} of {k_0} distinct integers which avoids at least one residue class modulo {p} for each prime {p}. Our objective is to obtain the following conjecture {DHL[k_0,2]} for as small a value of the parameter {k_0} as possible:

Conjecture 1 ({DHL[k_0,2]}) Let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then there exist infinitely many translates {n+{\mathcal H}} of {{\mathcal H}} that contain at least two primes.

The twin prime conjecture asserts that {DHL[k_0,2]} holds for {k_0} as small as {2}, but currently we are only able to establish this result for {k_0 \geq 6329} (see this comment). However, with the new truncated sieve of Pintz described in this post, we expect to be able to lower this threshold {k_0 \geq 6329} somewhat.

In previous posts, we deduced {DHL[k_0,2]} from a technical variant {MPZ[\varpi,\delta]} of the Elliot-Halberstam conjecture for certain choices of parameters {0 < \varpi < 1/4}, {0 < \delta < 1/4+\varpi}. We will use the following formulation of {MPZ[\varpi,\delta]}:

Conjecture 2 ({MPZ[\varpi,\delta]}) Let {{\mathcal H}} be a fixed {k_0}-tuple (not necessarily admissible) for some fixed {k_0 \geq 2}, and let {b\ (W)} be a primitive residue class. Then

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sum_{a \in C(q)} |\Delta_{b,W}(\Lambda; q,a)| = O( x \log^{-A} x) \ \ \ \ \ (1)

for any fixed {A>0}, where {I = (w,x^{\delta})}, {{\mathcal S}_I} are the square-free integers whose prime factors lie in {I}, and {\Delta_{b,W}(\Lambda;q,a)} is the quantity

\displaystyle  \Delta_{b,W}(\Lambda;q,a) := | \sum_{x \leq n \leq 2x: n=b\ (W); n = a\ (q)} \Lambda(n) \ \ \ \ \ (2)

\displaystyle  - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x: n = b\ (W)} \Lambda(n)|.

and {C(q)} is the set of congruence classes

\displaystyle  C(q) := \{ a \in ({\bf Z}/q{\bf Z})^\times: P(a) = 0 \}

and {P} is the polynomial

\displaystyle  P(a) := \prod_{h \in {\mathcal H}} (a+h).

The conjecture {MPZ[\varpi,\delta]} is currently known to hold whenever {87 \varpi + 17 \delta < \frac{1}{4}} (see this comment and this confirmation). Actually, we can prove a stronger result than {MPZ[\varpi,\delta]} in this regime in a couple ways. Firstly, the congruence classes {C(q)} can be replaced by a more general systetm of congruence classes obeying a certain controlled multiplicity axiom; see this post. Secondly, and more importantly for this post, the requirement that the modulus {q} lies in {{\mathcal S}_I} can be relaxed; see below.

To connect the two conjectures, the previously best known implication was the folowing (see Theorem 2 from this post):

Theorem 3 Let {0 < \varpi < 1/4}, {0 < \delta < 1/4 + \varpi} and {k_0 \geq 2} be such that we have the inequality

\displaystyle  (1 +4 \varpi) (1-\kappa') > \frac{j^2_{k_0-2}}{k_0(k_0-1)} (1+\kappa) \ \ \ \ \ (3)

where {j_{k_0-2} = j_{k_0-2,1}} is the first positive zero of the Bessel function {J_{k_0-2}}, and {\kappa,\kappa'>0} are the quantities

\displaystyle  \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n+1}{2} \frac{k_0^n}{n!} (\int_{4\delta/(1+4\varpi)}^1 (1-t)^{k_0/2}\ \frac{dt}{t})^n

and

\displaystyle  \kappa' := \sum_{2 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n-1}{2} \frac{(k_0-1)^n}{n!}

\displaystyle (\int_{4\delta/(1+4\varpi)}^1 (1-t)^{(k_0-1)/2}\ \frac{dt}{t})^n.

Then {MPZ[\varpi,\delta]} implies {DHL[k_0,2]}.

Actually there have been some slight improvements to the quantities {\kappa,\kappa'}; see the comments to this previous post. However, the main error {\kappa} remains roughly of the order {\delta^{-1} \exp( - 2 k_0\delta )}, which limits one from taking {\delta} too small.

To improve beyond this, the first basic observation is that the smoothness condition {q \in {\mathcal S}_I}, which implies that all prime divisors of {q} are less than {x^\delta}, can be relaxed in the proof of {MPZ[\varpi,\delta]}. Indeed, if one inspects the proof of this proposition (described in these three previous posts), one sees that the key property of {q} needed is not so much the smoothness, but a weaker condition which we will call (for lack of a better term) dense divisibility:

Definition 4 Let {y > 1}. A positive integer {q} is said to be {y}-densely divisible if for every {1 \leq R \leq q}, one can find a factor of {q} in the interval {[y^{-1} R, R]}. We let {{\mathcal D}_y} denote the set of positive integers that are {y}-densely divisible.

Certainly every integer which is {y}-smooth (i.e. has all prime factors at most {y} is also {y}-densely divisible, as can be seen from the greedy algorithm; but the property of being {y}-densely divisible is strictly weaker than {y}-smoothness, which is a fact we shall exploit shortly.

We now define {MPZ'[\varpi,\delta]} to be the same statement as {MPZ[\varpi,\delta]}, but with the condition {q \in {\mathcal S}_I} replaced by the weaker condition {q \in {\mathcal S}_{[w,+\infty)} \cap {\mathcal D}_{x^\delta}}. The arguments in previous posts then also establish {MPZ'[\varpi,\delta]} whenever {87 \varpi + 17 \delta < \frac{1}{4}}.

The main result of this post is then the following implication, essentially due to Pintz:

Theorem 5 Let {0 < \varpi < 1/4}, {0 < \delta \leq \delta' < 1/4 + \varpi}, {A \geq 0}, and {k_0 \geq 2} be such that

\displaystyle  (1 +4 \varpi) (1-2\kappa_1 - 2\kappa_2 - 2\kappa_3) > \frac{j^2_{k_0-2}}{k_0(k_0-1)}

where

\displaystyle  \kappa_1 := \int_{\theta}^1 (1-t)^{(k_0-1)/2}\ \frac{dt}{t}

\displaystyle  \kappa_2 := (k_0-1) \int_{\theta}^1 (1-t)^{k_0-1}\ \frac{dt}{t}

\displaystyle  \kappa_3 := e^A \frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)} \sum_{0 \leq J \leq 1/\tilde \delta} \frac{(k_0-1)^J}{J!} (\int_{\tilde \delta}^\theta e^{-At} \frac{dt}{t})^J

and

\displaystyle  \theta := \frac{\delta'}{1/4+\varpi}

\displaystyle  \tilde \theta := \frac{\delta' - \delta + \varpi}{1/4 + \varpi}

\displaystyle  \tilde \delta := \frac{\delta}{1/4+\varpi}

\displaystyle  G_{k_0-1}(0,0) := \int_0^1 f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt

\displaystyle  G_{k_0-1,\tilde \theta}(0,0) := \int_0^{\tilde \theta} f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt

and

\displaystyle  f(t) := t^{1-k_0/2} J_{k_0-2}( \sqrt{t} j_{k_0-2} ).

Then {MPZ'[\varpi,\delta]} implies {DHL[k_0,2]}.

This theorem has rather messy constants, but we can isolate some special cases which are a bit easier to compute with. Setting {\delta' = \delta}, we see that {\kappa_3} vanishes (and the argument below will show that we only need {MPZ[\varpi,\delta]} rather than {MPZ'[\varpi,\delta]}), and we obtain the following slight improvement of Theorem 3:

Theorem 6 Let {0 < \varpi < 1/4}, {0 < \delta < 1/4 + \varpi} and {k_0 \geq 2} be such that we have the inequality

\displaystyle  (1 +4 \varpi) (1-2\kappa_1-2\kappa_2) > \frac{j^2_{k_0-2}}{k_0(k_0-1)} \ \ \ \ \ (4)

where

\displaystyle  \kappa_1 := \int_{4\delta/(1+4\varpi)}^1 (1-t)^{(k_0-1)/2}\ \frac{dt}{t}

\displaystyle  \kappa_2 := (k_0-1) \int_{4\delta/(1+4\varpi)}^1 (1-t)^{k_0-1}\ \frac{dt}{t}.

Then {MPZ[\varpi,\delta]} implies {DHL[k_0,2]}.

This is a little better than Theorem 3, because the error {2\kappa_1+2\kappa_2} has size about {\frac{1}{2 k_0 \delta} \exp( - 2 k_0 \delta) + \frac{1}{2 \delta} \exp(-4 k_0 \delta)}, which compares favorably with the error in Theorem 3 which is about {\frac{1}{\delta} \exp(-2 k_0 \delta)}. This should already give a “cheap” improvement to our current threshold {k_0 \geq 6329}, though it will fall short of what one would get if one fully optimised over all parameters in the above theorem.

Returning to the full strength of Theorem 5, let us obtain a crude upper bound for {\kappa_3} that is a little simpler to understand. Extending the {J} summation to infinity and using the Taylor series for the exponential, we have

\displaystyle  \kappa_3 \leq \frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)} \exp( A + (k_0-1) \int_{\tilde \delta}^\theta e^{-At} \frac{dt}{t} ).

We can crudely bound

\displaystyle  \int_{\tilde \delta}^\theta e^{-At} \frac{dt}{t} \leq \frac{1}{A \tilde \delta}

and then optimise in {A} to obtain

\displaystyle  \kappa_3 \leq \frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)} \exp( 2 (k_0-1)^{1/2} \tilde \delta^{-1/2} ).

Because of the {t^{k_0-2}} factor in the integrand for {G_{k_0-1}} and {G_{k_0-1,\tilde \theta}}, we expect the ratio {\frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)}} to be of the order of {\tilde \theta^{k_0-1}}, although one will need some theoretical or numerical estimates on Bessel functions to make this heuristic more precise. Setting {\tilde \theta} to be something like {1/2}, we get a good bound here as long as {\tilde \delta \gg 1/k_0}, which at current values of {\delta, k_0} is a mild condition.

Pintz’s argument uses the elementary Selberg sieve, discussed in this previous post, but with a more efficient estimation of the quantity {\beta}, in particular avoiding the truncation to moduli {d} between {x^{-\delta} R} and {R} which was the main source of inefficiency in that previous post. The basic idea is to “linearise” the effect of the truncation of the sieve, so that this contribution can be dealt with by the union bound (basically, bounding the contribution of each large prime one at a time). This mostly avoids the more complicated combinatorial analysis that arose in the analytic Selberg sieve, as seen in this previous post.

— 1. Review of previous material —

In this section we collect some results from previous posts which we will need.

We first record an asymptotic for multiplicative functions. For any natural number {k}, define a {k}-dimensional multiplicative function to be a multiplicative function {f: {\bf N} \rightarrow {\bf R}} which obeys the asymptotic

\displaystyle  f(p) = k + O(\frac{1}{p})

for all {p>w}. The following result is Lemma 8 from this previous post:

Lemma 7 Let {I = (w,+\infty)}. Let {k} be a fixed positive integer, and let {f: {\bf N} \rightarrow {\bf R}} be a multiplicative function of dimension {k}. Then for any fixed compactly supported, Riemann-integrable function {g: {\bf R} \rightarrow {\bf R}}, and any {R>x^c} for some fixed {c>0}, one has

\displaystyle  \sum_{d \in {\mathcal S}_I} \frac{f(d)}{d} g(\frac{\log d}{\log R}) = (\frac{\phi(W)}{W} \log R)^k ( \int_0^\infty g(t) \frac{t^{k-1}}{(k-1)!}\ dt + o(1) ).

Next, we record a criterion for {DHL[k_0,2]}, which is Lemma 7 from this previous post:

Lemma 8 (Criterion for DHL) Let {k_0 \geq 2}. Suppose that for each fixed admissible {k_0}-tuple {{\mathcal H}} and each congruence class {b\ (W)} such that {b+h} is coprime to {W} for all {h \in {\mathcal H}}, one can find a non-negative weight function {\nu: {\bf N} \rightarrow {\bf R}^+}, fixed quantities {\alpha,\beta > 0}, a quantity {B>0}, and a fixed positive power {R} of {x} such that one has the upper bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) \leq (\alpha+o(1)) B\frac{x}{W}, \ \ \ \ \ (5)

the lower bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) \theta(n+h_i) \geq (\beta-o(1)) B\frac{x}{W} \log R \ \ \ \ \ (6)

for all {h_i \in {\mathcal H}}, and the key inequality

\displaystyle  \frac{\log R}{\log x} > \frac{1}{k_0} \frac{\alpha}{\beta} \ \ \ \ \ (7)

holds. Then {DHL[k_0,2]} holds. Here {\theta(n)} is defined to equal {\log n} when {n} is prime and {0} otherwise.

— 2. Pintz’s argument —

We can now prove Theorem 5. Fix {\varpi,\delta,\delta',k_0} to obey the hypotheses of this theorem. Let {b\ (W)} be a congruence class with {b+h} coprime to {W} for all {h \in {\mathcal H}} (this class exists by the admissibility of {{\mathcal H}}). We apply Lemma 8 with

\displaystyle B := (\frac{\phi(W)}{W} \log R)^{k_0}

the elementary Selberg sieve {\nu = \nu_{\mathcal X}} defined by

\displaystyle  \nu(n) := (\sum_{d \in {\mathcal S}_{(w,+\infty)}: d|P(n)} \mu(d) a_d)^2

where

\displaystyle  a_d := \frac{1}{\Phi(d) \Delta(d)} \sum_{q \in {\mathcal X}: (q,d)=1} \frac{1}{\Phi(q)} f'( \frac{\log dq}{\log R} ), \ \ \ \ \ (8)

{\Phi, \Delta} are the multiplicative functions

\displaystyle  \Phi(d) := \prod_{p|d} \frac{p-k_0}{k_0}

and

\displaystyle  \Delta(d) :=\prod_{p|d} \frac{k_0}{p},

the sieve level {R} is given by the formula

\displaystyle  R := x^{1/4 + \varpi},

{f: {\bf R} \rightarrow {\bf R}} is a fixed smooth function supported on {[-1,1]}, and {{\mathcal X}} is a certain subset of {{\mathcal S}_{(w,+\infty)}} to be chosen shortly. We will assume that {f} non-negative and non-increasing on {[0,1]}. In this previous post, we considered this sieve with {{\mathcal X}} equal to either all of {{\mathcal S}_{(w,+\infty)}}, or the subset {{\mathcal S}_{(w,x^\delta)}} consisting of smooth numbers. For now, we will discuss the estimates as far as we can without having to explicitly specify {{\mathcal X}}.

We first consider the asymptotic (5). By arguing exactly as in Section 2 (or Section 3) of this previous post, we can write the left-hand side of (5), up to errors of {o(B \frac{x}{W})}, as

\displaystyle  \sum_{d_0 \in {\mathcal X}} \frac{1}{\Phi(d_0)} f'(\frac{\log d_0}{\log R})^2.

The summand here is non-negative, so we may crudely replace {{\mathcal X}} by all of {{\mathcal S}_{(w,+\infty)}} and apply Lemma 7 to obtain (5) with

\displaystyle  \alpha := \int_0^1 f'(t)^2 \frac{t^{k_0-1}}{(k_0-1)!}\ dt.

Now we turn to the more difficult asymptotic (6). The left-hand side expands as

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) a_{d_1} \mu(d_2) a_{d_2} \sum_{x \leq n \leq 2x: [d_1,d_2] | P(n); n = b\ (W)} \theta(n+h_i).

As observed in Section 2 of this previous post, we have

\displaystyle  \sum_{x \leq n \leq 2x: [d_1,d_2] | P(n); n = b\ (W)} \theta(n+h) = \frac{1}{\phi(W)} x \Delta^*([d_1,d_2]) + O( E^*([d_1,d_2]) )

where

\displaystyle  \Delta^*(q) := \prod_{p|q} \frac{k_0-1}{p-1}

and

\displaystyle  E^*(q) = \sum_{a \in C_i(q)} | \sum_{x \leq n \leq 2x: n=b\ (W); n = a\ (q)} \theta(n) - \frac{x}{\phi(Wq)}|.

Now let {\epsilon>0} be a small fixed constant to be chosen later, and suppose the following claim holds:

Claim 1 (Dense divisibility of moduli) Whenever {q = [d_1,d_2]} and {a_{d_1},a_{d_2}} are non-zero, then either {q \leq x^{1/2-\epsilon}} or else {q \in {\mathcal D}_{x^\delta}}.

Then from the Bombieri-Vinogradov theorem (for the {q \leq x^{1/2-\epsilon}} moduli) or the hypothesis {MPZ'[\varpi,\delta]} (for the larger moduli, noting that {q \leq R^2 = x^{1/2+2\varpi}}) and standard arguments (cf. Proposition 5 of this post) we have

\displaystyle  \sum_q h(q) E^*(q) \ll x \log^{-A} x

for any fixed {A>0} and any multiplicative function {h} of a fixed dimension {k}, where {q} ranges only over those integers of the form {q=[d_1,d_2]} with {a_{d_1},a_{d_2}} non-zero. From this we easily see (arguing as in Section 2 of this previous post) that the contribution of the {E^*} error term is {o( B \frac{x}{W}\log R)}, and we are left with establishing the lower bound

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_{(w,\infty)}} \mu(d_1) a_{d_1} \mu(d_2) a_{d_2} \Delta^*([d_1,d_2])

\displaystyle  \geq \beta (\frac{\phi(W)}{W} \log R)^{k_0+1}

up to errors of {o( (\frac{\phi(W)}{W} \log R)^{k_0+1} )} (henceforth referred to as negligible errors).

As in Section 2 of the previous section, we can write the left-hand side as

\displaystyle  \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} (\sum_{m \in {\mathcal S}_{(w,\infty)}: (m,d_0)=1; md_0 \in {\mathcal X}} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} )^2 \ \ \ \ \ (9)

where {h} is the {k_0-1}-dimensional multiplicative function

\displaystyle  h(d) := \prod_{p|d} (k_0-1) \frac{(p-1)^2}{p(p-k_0)}.

So we would like to select {{\mathcal X}} small enough that Claim (1) holds, but large enough that we can lower bound (9) by {\beta (\frac{\phi(W)}{W} \log R)^{k_0+1}} up to negligible errors.

Pintz’s idea is to choose {{\mathcal X}} to be the set of all elements {d} of {{\mathcal S}_{(w,x^{\delta'})}} with the property that

\displaystyle  \prod_{p|d:p < x^\delta} p \geq x^{\delta' - \delta + \varpi + \epsilon/2}. \ \ \ \ \ (10)

Let us first verify Claim 1 with this definition. Suppose that {a_d} is non-zero, then from (8) and the support of {f} there is a multiple {dr} of {d} with {dr \in {\mathcal X}} (in particular {d \in {\mathcal S}_{(w,x^{\delta'})}}) and {dr \leq x^{1/4 + \varpi}}. The latter condition implies that either {d \leq x^{1/4-\epsilon/2}} or that {dr \leq x^{1/4+\varpi}}, and {r \leq x^{\varpi + \epsilon/2}}. In the latter case we see from (10) (applied to {dr}) that

\displaystyle  \prod_{p|d: p < x^\delta} p \geq x^{\delta'-\delta}. \ \ \ \ \ (11)

Thus, if {q = [d_1,d_2]} with {a_{d_1}, a_{d_2}} non-zero, this implies that {q \in {\mathcal S}_{(w,x^{\delta'})}}, and that either {q \leq x^{1/2-\epsilon}} or that

\displaystyle  \prod_{p|q: p < x^\delta} p \geq x^{\delta'-\delta}.

The latter conclusion implies that {q} is {x^\delta}-densely divisible. Indeed, for any {1 \leq R \leq q} we multiply together all the prime divisors of {q} between {x^\delta} and {x^{\delta'}} one at a time until just before one reaches or exceeds {R}. This must place one at least as large as {x^{-\delta'} R}. Next, one multiplies to this the prime divisors of {q} less than {x^\delta} until one reaches or exceeds {x^{-\delta} R}; this is possible thanks to (11), and gives a divisor between {x^{-\delta} R} and {R} as required.

Now we need to obtain a lower bound for (9). If we write

\displaystyle  F(d_0) := \sum_{m \in {\mathcal S}_{(w,\infty)}: (m,d_0)=1} \frac{-f'(\frac{\log d_0 m}{\log R})}{\phi(m)}

and

\displaystyle  \tilde F(d_0) := \sum_{m \in {\mathcal S}_{(w,\infty)}: (m,d_0)=1; md_0 \in {\mathcal X}} \frac{-f'(\frac{\log d_0 m}{\log R})}{\phi(m)}

(the minus sign being to compensate for the non-positive nature of {f'}) then we have

\displaystyle  0\leq \tilde F(d_0) \leq F(d_0)

and thus

\displaystyle  \tilde F(d_0)^2 \geq F(d_0)^2 - 2 F(d_0) (F(d_0)-\tilde F(d_0)).

Note that this inequality replaces the quadratic expression {\tilde F(d_0)^2} with a linear expression in the truncation error {F(d_0)-\tilde F(d_0)}, which will be more tractable for computing the effect of that error. We may thus lower bound (9) by the difference of

\displaystyle  \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} F(d_0)^2 \ \ \ \ \ (12)

and

\displaystyle  2 \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} F(d_0) (F(d_0)-\tilde F(d_0)). \ \ \ \ \ (13)

In Section 2 of this post it is shown that

\displaystyle  F(d_0) = (\frac{\phi(W)}{W} \log R) (f(\frac{\log d_0}{\log R}) + O( \frac{d_0}{\phi(d_0)}-1 ) + o(1) ),

which implies that (12) is equal to

\displaystyle  (\frac{\phi(W)}{W} \log R)^{k_0+1} \int_0^1 f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt

up to negligible errors. Similar considerations show that (13) is equal to

\displaystyle  2 (\frac{\phi(W)}{W} \log R) \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} f(\frac{\log d_0}{\log R}) (F(d_0)-\tilde F(d_0)) \ \ \ \ \ (14)

up to negligible errors. To upper bound (14), we need to upper bound

\displaystyle F(d_0)-\tilde F(d_0) = \sum_{m \in {\mathcal S}_{(w,\infty)}: (m,d_0)=1; md_0 \not \in {\mathcal X}} \frac{-f'(\frac{\log d_0 m}{\log R})}{\phi(m)}.

For this we need to catalog the ways in which {md_0} can fail to be in {{\mathcal X}}. In order for this to occur, at least one of the following three statements must hold:

  • (i) {m} could be divisible by a prime {p} with {x^{\delta'} \leq x \leq R}.
  • (ii) {d_0} could be divisible by a prime {p} with {x^{\delta'} \le x \leq R}.
  • (iii) {d_0} lies in {{\mathcal S}_{(w,x^{\delta'})}}, but {\prod_{p|d_0: p < x^\delta} p < x^{\delta' - \delta + \varpi + \epsilon/2}}.

We consider the contributions of (i), (ii), (iii) to (14). We begin with the contribution of (i). This is bounded above by

\displaystyle  2 (\frac{\phi(W)}{W} \log R) \sum_{x^{\delta'} \leq p \leq R} \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0}

\displaystyle  \sum_{m \in {\mathcal S}_{(w,\infty)}} \frac{-f'(\frac{\log d_0 p m}{\log R})}{\phi(m)\phi(p)}.

Applying Lemma 7, one can simplify this modulo negligible errors as

\displaystyle  2 (\frac{\phi(W)}{W} \log R)^2 \sum_{x^{\delta'} \leq p \leq R} \frac{1}{\phi(p)} \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} f(\frac{\log d_0}{\log R}) f( \frac{\log d_0}{\log R} + \frac{\log p}{\log R} )

which by another application of Lemma 7 is equal to

\displaystyle  2 (\frac{\phi(W)}{W} \log R)^{k+1} \sum_{x^{\delta'} \leq p \leq R} \frac{1}{\phi(p)} G_{k_0-1}( 0, \frac{\log p}{\log R})

where we adopt the notation

\displaystyle  G_{k_0-1}(t_1,t_2) := \int_0^1 f(t+t_1) f(t+t_2) \frac{t^{k_0-2}}{(k_0-2)!}\ dt.

Applying Mertens’ theorem and summation by parts, this expression is equal up to negligible errors to

\displaystyle  2 (\frac{\phi(W)}{W} \log R)^{k+1} \int_{\theta}^1 G_{k_0-1}(0,t)\ \frac{dt}{t}

where

\displaystyle  \theta := \frac{\log x^{\delta'}}{\log R} = \frac{\delta'}{1/4 + \varpi}.

Now we turn to the contribution of (ii) to (14). This is bounded above by

\displaystyle 2 (\frac{\phi(W)}{W} \log R) \sum_{x^{\delta'} \leq p \le R} \frac{h(p)}{p} \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} f(\frac{\log pd_0}{\log R}) F(pd_0).

By Lemma 7 we have

\displaystyle  F(pd_0) \leq (\frac{\phi(W)}{W} \log R) (f(\frac{\log pd_0}{\log R}) + o(1) )

and so we may bound this contribution up to negligible errors by

\displaystyle 2 (\frac{\phi(W)}{W} \log R)^2 \sum_{x^{\delta'} \leq p \leq R} \frac{h(p)}{p} \sum_{d_0 \in {\mathcal S}_{(w,\infty)}} \frac{h(d_0)}{d_0} f(\frac{\log pd_0}{\log R})^2

which by Lemma 7 again is equal (up to negligible errors) to

\displaystyle 2 (\frac{\phi(W)}{W} \log R)^{k+1} \sum_{x^{\delta'} \leq p \leq R} \frac{h(p)}{p} G_{k_0-1}( \frac{\log p}{\log R}, \frac{\log p}{\log R} ).

By definition, {h(p) = k_0-1 + o(1)}. By Mertens’ theorem, we can thus write the above expression up to negligible errors as

\displaystyle  2(k_0-1) (\frac{\phi(W)}{W} \log R)^{k+1} \int_\theta^1 G_{k_0-1}(t,t)\ \frac{dt}{t}.

Finally, we turn to the contribution of case (iii) to (14). By Proposition 7 we have

\displaystyle  F(d_0) \leq (\frac{\phi(W)}{W} \log R) (f(\frac{\log d_0}{\log R}) + o(1) )

so we may bound this contribution up to negligible errors by

\displaystyle 2 (\frac{\phi(W)}{W} \log R)^2 \sum_{d_0} \frac{h(d_0)}{d_0} f(\frac{\log d_0}{\log R})^2

where {d_0} is as in case (iii).

We introduce the quantities

\displaystyle  \tilde \theta := \frac{\log x^{\delta' - \delta + \varpi + \epsilon/2}}{\log R} = \frac{\delta' - \delta + \varpi + \epsilon/2}{1/4+\varpi}

and

\displaystyle  \tilde \delta := \frac{\log x^\delta}{\log R} = \frac{\delta}{1/4+\varpi}

so that case (iii) consists of those {d_0} in {{\mathcal S}_{(w, R^\theta)}} such that

\displaystyle  \prod_{p|d_0: p < R^{\tilde \delta}} p < R^{\tilde \theta}.

From the support of {F} we may also take {d_0 \leq R}. This implies that we may factor

\displaystyle  d_0 = p_1 \ldots p_J d

for some primes

\displaystyle  R^{\tilde \delta} \leq p_1 < \ldots < p_J \leq R^{\theta}

and some {d \leq R^{\tilde \theta}} coprime to {p_1,\ldots,p_J}, with

\displaystyle  0 \leq J \leq \frac{1}{\tilde \delta}.

The contribution of this case may thus be bounded by

\displaystyle  2 (\frac{\phi(W)}{W} \log R)^2 \sum_{0 \leq J \leq \frac{1}{\tilde \delta}} \sum_{R^{\tilde \delta} \leq p_1 < \ldots < p_J \leq R^\theta} \frac{h(p_1 \ldots p_J)}{p_1 \ldots p_J}

\displaystyle  \sum_{d \leq R^{\tilde \theta}} \frac{h(d)}{d} f(\frac{\log d p_1 \ldots p_J}{\log R})^2.

Evaluating the inner sum using Lemma 7, we obtain (up to negligible errors)

\displaystyle  2 (\frac{\phi(W)}{W} \log R)^{k+1} \sum_{0 \leq J \leq \frac{1}{\tilde \delta}} \sum_{R^{\tilde \delta} \leq p_1 < \ldots < p_J \leq R^\theta} \frac{h(p_1 \ldots p_J)}{p_1 \ldots p_J}

\displaystyle  G_{k_0-1,\tilde \theta}( \frac{\log p_1 \ldots p_J}{\log R}, \frac{\log p_1 \ldots p_J}{\log R})

where {G_{k_0-1,\tilde \theta}} is a truncation of {G_{k_0-1}}:

\displaystyle  G_{k_0-1, \tilde \theta}(t_1,t_2) := \int_0^{\tilde \theta} f(t+t_1) f(t+t_2) \frac{t^{k_0-2}}{(k_0-2)!}\ dt.

Again we have {h(p_1 \ldots p_J) = (k_0-1)^J + o(1)}. By Mertens’ theorem we may write this (up to negligible errors) as

\displaystyle  2 (\frac{\phi(W)}{W} \log R)^{k+1} \sum_{0 \leq J \leq \frac{1}{\tilde \delta}} (k_0-1)^J \int_{\tilde \delta \leq t_1 < \ldots < t_J \leq \theta}

\displaystyle  G_{k_0-1,\tilde \theta}( t_1 + \ldots + t_J, t_1 + \ldots + t_J)\ \frac{dt_1 \ldots dt_J}{t_1 \ldots t_J}.

Putting all this together, we have obtained the lower bound (6) with

\displaystyle  \beta = G_{k_0-1}(0,0) (1 - 2\kappa_1 - 2\kappa_2 - 2\kappa_3)

where

\displaystyle  \kappa_1 := G_{k_0-1}(0,0)^{-1} \int_{\theta}^1 G_{k_0-1}(0,t)\ \frac{dt}{t}

\displaystyle  \kappa_2 := (k_0-1) G_{k_0-1}(0,0)^{-1} \int_{\theta}^1 G_{k_0-1}(t,t)\ \frac{dt}{t}

and

\displaystyle  \kappa_3 = G_{k_0-1}(0,0)^{-1} \sum_{0 \leq J \leq \frac{1}{\tilde \delta}} (k_0-1)^J \int_{\tilde \delta \leq t_1 < \ldots < t_J \leq \theta}

\displaystyle  G_{k_0-1,\tilde \theta}( t_1 + \ldots + t_J, t_1 + \ldots + t_J)\ \frac{dt_1 \ldots dt_J}{t_1 \ldots t_J}.

We now place upper bounds on {\kappa_1,\kappa_2,\kappa_3}. In this previous post, the bounds

\displaystyle  G_{k_0-1}(0,t) \leq (1-t)^{(k_0-1)/2} G_{k_0-1}(0,0)

and

\displaystyle  G_{k_0-1}(t,t) \leq (1-t)^{k_0-1} G_{k_0-1}(0,0)

for {0 < t < 1} are proven. Thus we have

\displaystyle  \kappa_1 \leq \int_{\theta}^1 (1-t)^{(k_0-1)/2}\ \frac{dt}{t}

\displaystyle  \kappa_2 \leq (k_0-1) \int_{\theta}^1 (1-t)^{k_0-1}\ \frac{dt}{t}.

These are already quite small for {\theta \approx 1/2}, say, which would correspond to {\delta' \approx 1/8}.

For {\kappa_3} we will use the crude estimate

\displaystyle  G_{k_0-1,\tilde \theta}( t_1 + \ldots + t_J, t_1 + \ldots + t_J) \leq G_{k_0-1,\tilde \theta}(0,0);

this may surely be improved, but we will not do so here to simplify the exposition. Then we may bound

\displaystyle  \kappa_3 \leq \frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)} \sum_{0 \leq J \leq 1/\tilde \delta} \frac{(k_0-1)^J}{J!} (\log \frac{\theta}{\tilde \delta})^J.

The point here is that the first term {\frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)}} is exponentially decaying in {k_0}, which can compensate for the second term if {1/\tilde \delta \ll k_0} which is currently the case in the regime of interest.

One can do a bit better than this. For any parameter {A\geq 0}, one has

\displaystyle  G_{k_0-1,\tilde \theta}( t_1 + \ldots + t_J, t_1 + \ldots + t_J) \leq e^A e^{-A(t_1 + \ldots + t_J)} G_{k_0-1,\tilde \theta}(0,0)

since the left-hand side vanishes for {t_1+\ldots+t_J \geq 1}. This gives the bound

\displaystyle  \kappa_3 \leq e^A \frac{G_{k_0-1,\tilde \theta}(0,0)}{G_{k_0-1}(0,0)} \sum_{0 \leq J \leq 1/\tilde \delta} \frac{(k_0-1)^J}{J!} (\int_{\tilde \delta}^\theta e^{-At} \frac{dt}{t})^J.

If we insert these bounds into (7), send {\epsilon} to zero, and optimise in {f} using Theorem 14 from this previous post, we obtain Theorem 5.


Filed under: math.NT, polymath Tagged: Janos Pintz, polymath8, Selberg sieve

n-Category Café A Characterization of Relative Entropy (Part 1)

I’m trying to finish off a paper that Tobias Fritz and I have been working on, which gives a category-theoretic (and Bayesian!) characterization of relative entropy. It’s a kind of sequel to our paper with Tom Leinster, in which we characterized entropy.

That earlier paper was developed in conversations here on the n-Category Café. It was a lot of fun; I sort of miss that style of working. Also, to get warmed up, I need to think through some things I’ve thought about before. So, I might as well write them down here.

There are many categories related to probability theory, and they’re related in many ways. Last summer—on the 24th of August 2012, according to my notes here—Jamie Vicary, Brendan Fong and I worked through a bunch of these relationships. I need to write them down now, even if they’re not all vitally important to my paper with Tobias. They’re sort of buzzing around my brain like flies.

(Tobias knows this stuff too, and this is how we think about probability theory, but we weren’t planning to stick it in our paper. Maybe we should.)

Let’s restrict attention to probability measures on finite sets, and related structures. We could study these questions more generally, and we should, but not today. What we’ll do is give a unified purely algebraic description of:

  • finite sets
  • measures on finite sets
  • probability measures on finite sets
  • functions
  • bijections
  • measure-preserving functions
  • stochastic maps

Finitely generated free [0,)-modules

We start with the rig of nonnegative real numbers with their usual addition and multiplication; let’s call this [0,). The idea is that measure theory, and probability theory, are closely related to linear algebra over this rig.

So, we start with the category of finitely generated free [0,)-modules, and module homomorphisms. Every such module is isomorphic to [0,) S for some finite set S.

Puzzle. Do we need to say ‘free’ here? Are there finitely generated modules over [0,) that aren’t free?

In other words, every finitely generated free [0,)-module is isomorphic to [0,) n for some n=0,1,2,. So, the category of these is equivalent to Mat([0,)), the category where objects are natural numbers, a morphism from m to n is an m×n matrix of numbers in [0,), and composition is done by matrix multiplication. So let’s just call this category Mat([0,)), instead of something complicated like FinGenFMod [0,). I’ll call the morphisms in here maps.

We can take tensor products of finitely generated free modules, and this makes Mat([0,)) into a symmetric monoidal -category. This means we can draw maps using string diagrams in the usual way. However, I’m feeling lazy so I’ll often write equations when I could be drawing diagrams.

One of the rules of the game is that all these equations will make sense in any symmetric monoidal -category. So we could, if we wanted, generalize ideas from probability theory this way. If you want to do this, you’ll need to know that [0,) is the unit for the tensor product in the category of finitely generated free [0,)-algebras. We’ll be seeing this guy a lot. So if you want to generalize, just call it ‘the tensor unit’.

Finite sets

There’s a way to see the category of finite sets lurking in Mat([0,)), which we can borrow from this paper:

For any finite set S, we get a free finitely generated [0,)-module, namely [0,) S. This comes with some structure:

  • a multiplication m:[0,) S[0,) S[0,) S, coming from pointwise multiplication of [0,)-valued functions on S;
  • the unit for this multiplication, an element of [0,) S, which we can write as a morphism i:[0,)[0,) S;
  • a comultiplication, obtained by taking the diagonal map Δ:SS×S and promoting it to a linear map Δ:[0,) S[0,) S[0,) S;
  • a counit for this comultiplication, obtained by taking the map to the terminal set !:S1 and promoting it to a linear map e:[0,) S[0,).

These morphisms m,i,Δ,e make

x=[0,) S

into a commutative Frobenius algebra in Mat([0,)). That’s a thing where the unit, counit, multiplication and comultiplication obey these laws:

(I drew these back when I was feeling less lazy.) This Frobenius algebra is also ‘special’, meaning it obeys this:

And it’s also a -Frobenius algebra, meaning that the counit and comultiplication are obtained from the unit and multiplication by ‘flipping’ them using the -category structure. (If we think of a morphism in Mat([0,)) as a matrix, its dagger is its transpose.)

Conversely, suppose we have any special commutative -Frobenius algebra x. Then using the ideas in the paper by Coecke, Pavlovich and Vicary we can recover a basis for x, consisting of the vectors e ix with

Δ(e i)=e ie i

This basis forms a set S such that

x[0,) S

for some specified isomorphism in Mat([0,)). Furthermore, this is an isomorphism of special commutative -Frobenius algebras!

In short, a special commutative -Frobenius algebra in Mat([0,)) is just a fancy way of talking about a finite set.

Functions and bijections

Now suppose we have two special commutative -Frobenius algebra in Mat([0,)): x and y.

Suppose f:xy is a Frobenius algebra homomorphism: that is, a map preserving all the structure—the unit, counit, multiplication and comultiplication. Then it comes from an isomorphism of finite sets. This lets us find FinSet 0, the groupoid of finite sets and bijections, inside Mat([0,)).

Alternatively, suppose f:xy is just a coalgebra homomorphism: that is a map preserving just the counit and comultiplication. Then it comes from an arbitrary function between finite sets. This lets us find FinSet, the category of finite sets and functions, inside Mat([0,)).

If f preserves the comultiplication, it automatically preserves the counit. But what if it preserves just the counit? This sounds like a dry, formal question. But it’s not: the answer is something useful, a ‘stochastic map’.

Stochastic maps

A stochastic map from a finite set S to a finite set T is a map sending each point of S to a probability measure on T.

We can think of this as a T×S-shaped matrix of numbers in [0,), where a given column gives the probability that a given point in S goes to any point in T. The sum of the numbers in each column will be 1. And conversely, any T×S-shaped matrix of numbers in [0,), where each column sums to 1, gives a stochastic map from S to T.

But now let’s describe this idea using the category Mat([0,)). We’ve seen a finite set is the same as a special commutative -Frobenius algebra. So, say we have two of these, x and y. Our matrix of numbers in [0,) is just a map

f:xy

So, we just need a way to state the condition that each column in the matrix sums to 1. And this condition simply says that f preserves the counit:

ϵ yf=ϵ x

where ϵ x:x[0,) is the counit for x, and similarly for ϵ y.

To understand this, note that if we use the canonical isomorphism

x[0,) S

the counit ϵ x can be seen as the map

[0,) S[0,)

that takes any S-tuple of numbers and sums them up. In other words, it’s integration with respect to counting measure. So, the equation

ϵ yf=ϵ x

says that if we take any S-tuple of numbers, multiply it by the matrix f, and then sum up the entries of the resulting T-tuple, it’s the same as if we summed up the original S-tuple. But this says precisely that each column of the matrix f sums to 1.

Finite measure spaces

Now let’s use our formalism to describe finite measure spaces—by which, beware, I mean a finite sets equipped with measures! To do this, we’ll use a special commutative -Frobenius algebra x in Mat([0,)) together with any map

μ:[0,)x

Starting from these, we get a specified isomorphism

x[0,) S

and μ sends the number 1 to a vector in [0,) S: that is, a function on S taking values in [0,). Multiplying this function by counting measure, we get a measure on S.

Puzzle. How can we describe this measure without the annoying use of counting measure?

Conversely, any measure on a finite set gives a special commutative -Frobenius algebra x in Mat([0,)) equipped with a map from [0,).

So, we can say a finite measure space is a special commutative -Frobenius algebra in Mat([0,)) equipped with a map

μ:[0,)x

And given two of these,

μ:[0,)x,ν:[0,)y

and a coalgebra morphism

f:xy

such that the obvious triangle commutes:

fμ=ν

then we get a measure-preserving function between finite measure spaces! Conversely, any measure-preserving function between finite measure spaces gives us such a commutative triangle.

So, we get a way of describing the category FinMeas, with finite measure spaces as objects and measure-preserving maps as objects.

Finite probability measure spaces

I’m mainly interested in probability measures. So suppose x is a special commutative -Frobenius algebra in Mat([0,)) equipped with a map

μ:[0,)x

We’ve seen this gives a finite measure space. But this is a probability measure space if and only if

eμ=1

where

e:x[0,)

is the counit for x. The equation simply says the total integral of our measure μ is 1.

So, we get a way of describing the category FinProb, with finite probability measure spaces as objects and measure-preserving maps as objects. Given finite probability measure spaces described this way:

μ:[0,)x,ν:[0,)y

a measure-preserving function is a coalgebra morphism

f:xy

such that the obvious triangle commutes:

fμ=ν

Measure-preserving stochastic maps

Say we have two finite measure spaces. Then we can ask whether a stochastic map from one to the other is measure-preserving. And we can answer this question in the language of Mat([0,)).

Remember, a finite measure space is a special commutative -Frobenius algebra x in Mat([0,)) together with a map

μ:[0,)x

Say we have another one:

ν:[0,)y

A stochastic map is just a map

f:xy

that preserves the counit:

ϵ yf=ϵ x

But it’s a measure-preserving stochastic map if also

fμ=ν

Next…

There’s a lot more to say; I haven’t gotten anywhere near what Tobias and I are doing! But it’s pleasant to have this basic stuff written down.

June 19, 2013

Mark Chu-Carroll

Sorry for the slowness of the blog lately. I finally got myself back onto a semi-regular schedule when I posted about the Adria Richards affair, and that really blew up. The amount of vicious, hateful bile that showed up, both in comments (which I moderated) and in my email was truly astonishing. I've written things which pissed people off before, and I've gotten at least my fair share of hatemail. But nothing I've written before came close to preparing me for the kind of unbounded hatred that came in response to that post.

I really needed some time away from the blog after that.

Anyway, I'm back, and it's time to get on with some discrete probability theory!

I've already written a bit about interpretations of probability. But I haven't said anything about what probability means formally. When I say that the probability of rolling a 3 with a pair of fair six-sided dice is 1/18, how do I know that? Where did that 1/6th figure come from?

The answer lies in something called a probability space. I'm going to explain the probability space in frequentist terms, because I think that that's easiest, but there is (of course) an equivalent Bayesian description.) Suppose I'm looking at a particular experiment. In classic mathematical form, a probability space consists of three components (Ω, E, P), where:

  1. Ω, called the sample space, is a set containing all possible outcomes of the experiment. For a pair of dice, Ω would be the set of all possible rolls: {(1,1), (1,2), (1,3), (1,4), (1,5), (1, 6), (2,1), ..., (6, 5), (6,6)}.
  2. E is an equivalence relation over Ω, which partitions Ω into a set of events. Each event is a set of outcomes that are equivalent. For rolling a pair of dice, an event is a total - each event is the set of outcomes that have the same total. For the event "3" (meaning a roll that totalled three), the set would be {(1, 2), (2, 1)}.
  3. P is a probability assignment. For each event e in E, P(e) is a value between 0 and 1, where:

     \Sigma_{e\in E} P(e) = 1

    (That is, the sum of the probabilities of all of the possible events in the space is exactly 1.)

The probability of an event e being the outcome of a trial is P(e).

So the probability of any particular event as the result of a trial is a number between 0 and 1. What's it mean? If the probability of event e is p, then if we repeat the trial N times, we expect N*p of those trials to have e as their result. If the probability of e is 1/4, and we repeat the trial 100 times, we'd expect e to be the result 25 times.

But in an important sense, that's a cop-out. We've defined probability in terms of this abstract model, where the third component is the probability. Isn't that circular?

Not really. For a given trial, we create the probability assignment by observation and/or analysis. The important point is that this is really just a bare minimum starting point. What we really care about in probability isn't the change associated with a single, simple, atomic event. What we want to do is take the probability associated with a group of single events, and use our understanding of that to allow us to explore a complex event.

If I give you a well-shuffled deck of cards, it's easy to show that the odds of drawing the 3 of diamonds is 1/52. What we want to do with probability is things like ask: What are the odds of being dealt a flush in a poker hand?

The construction of a probability space gives us a well-defined platform to use for building probabilistic models of more interesting things. Give a probability space of two single dice, we can combine them together to create the probability space of the two dice rolled together. Given the probability space of a pair of dice, we can construct the probability space of a game of craps. And so on.

Share

Terence TaoEstimation of the Type III sums

This is the final continuation of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and this previous post, that covers the Type I and Type II sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of the final and most innovative of the key estimates in Zhang’s paper, namely the Type III estimate.

The main estimate was already stated as Theorem 17 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take {x} to be a parameter going off to infinity, with the usual asymptotic notation {O(), o(), \ll} associated to this parameter.

Definition 1 (Coefficient sequences) A coefficient sequence is a finitely supported sequence {\alpha: {\bf N} \rightarrow {\bf R}} that obeys the bounds

\displaystyle  |\alpha(n)| \ll \tau^{O(1)}(n) \log^{O(1)}(x) \ \ \ \ \ (1)

for all {n}, where {\tau} is the divisor function.

For any {I \subset {\bf R}}, let {{\mathcal S}_I} denote the square-free numbers whose prime factors lie in {I}. The main result of this post is then the following result of Zhang:

Theorem 2 (Type III estimate) Let {\varpi, \delta > 0} be fixed quantities, and let {M, N_1, N_2, N_3 \gg 1} be quantities such that

\displaystyle  x \ll M N_1 N_2 N_3 \ll x

and

\displaystyle  N_1 \gg N_2, N_3

and

\displaystyle  N_1^4 N_2^4 N_3^5 \gg x^{4+16\varpi+\delta+c}

for some fixed {c>0}. Let {\alpha, \psi_1, \psi_2, \psi_3} be coefficient sequences at scale {M,N_1,N_2,N_3} respectively with {\psi_1,\psi_2,\psi_3} smooth. Then for any {I \subset [1,x^\delta]} we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha \ast \beta; a)| \ll x \log^{-A} x.

In fact we have the stronger “pointwise” estimate

\displaystyle  |\Delta(\alpha \ast \psi_1 \ast \psi_2 \ast \psi_3; a)| \ll x^{-\epsilon} \frac{x}{q} \ \ \ \ \ (4)

for all {q \in {\mathcal S}_I} with {q < x^{1/2+2\varpi}} and all {a \in ({\bf Z}/q{\bf Z})^\times}, and some fixed {\epsilon>0}.

(This is very slightly stronger than previously claimed, in that the condition {N_2 \gg N_3} has been dropped.)

It turns out that Zhang does not exploit any averaging of the {\alpha} factor, and matters reduce to the following:

Theorem 3 (Type III estimate without {\alpha}) Let {\delta > 0} be fixed, and let {1 \ll N_1, N_2, N_3, d \ll x^{O(1)}} be quantities such that

\displaystyle  N_1 \gg N_2, N_3

and

\displaystyle d \in {\mathcal S}_{[1,x^\delta]}

and

\displaystyle  N_1^4 N_2^4 N_3^5 \gg d^8 x^{\delta+c}

for some fixed {c>0}. Let {\psi_1,\psi_2,\psi_3} be smooth coefficient sequences at scales {N_1,N_2,N_3} respectively. Then we have

\displaystyle  |\Delta(\psi_1 \ast \psi_2 \ast \psi_3; a)| \ll x^{-\epsilon} \frac{N_1 N_2 N_3}{d}

for all {a \in ({\bf Z}/d{\bf Z})^\times} and some fixed {\epsilon>0}.

Let us quickly see how Theorem 3 implies Theorem 2. To show (4), it suffices to establish the bound

\displaystyle  \sum_{n = a\ (q)} \alpha \ast \psi_1 \ast \psi_2 \ast \psi_3(n) = X + O( x^{-\epsilon} \frac{x}{q} )

for all {a \in ({\bf Z}/q{\bf Z})^\times}, where {X} denotes a quantity that is independent of {a} (but can depend on other quantities such as {\alpha,\psi_1,\psi_2,\psi_3,q}). The left-hand side can be rewritten as

\displaystyle  \sum_{b \in ({\bf Z}/q{\bf Z})^\times} \sum_{m = b\ (q)} \alpha(m) \sum_{n = a/b\ (q)} \psi_1 \ast \psi_2 \ast \psi_3(n).

From Theorem 3 we have

\displaystyle  \sum_{n = a/b\ (q)} \psi_1 \ast \psi_2 \ast \psi_3(n) = Y + O( x^{-\epsilon} \frac{N_1 N_2 N_3}{q} )

where the quantity {Y} does not depend on {a} or {b}. Inserting this asymptotic and using crude bounds on {\alpha} (see Lemma 8 of this previous post) we conclude (4) as required (after modifying {\epsilon} slightly).

It remains to establish Theorem 3. This is done by a set of tools similar to that used to control the Type I and Type II sums:

  • (i) completion of sums;
  • (ii) the Weil conjectures and bounds on Ramanujan sums;
  • (iii) factorisation of smooth moduli {q \in {\mathcal S}_I};
  • (iv) the Cauchy-Schwarz and triangle inequalities (Weyl differencing).

The specifics are slightly different though. For the Type I and Type II sums, it was the classical Weil bound on Kloosterman sums that were the key source of power saving; Ramanujan sums only played a minor role, controlling a secondary error term. For the Type III sums, one needs a significantly deeper consequence of the Weil conjectures, namely the estimate of Bombieri and Birch on a three-dimensional variant of a Kloosterman sum. Furthermore, the Ramanujan sums – which are a rare example of sums that actually exhibit better than square root cancellation, thus going beyond even what the Weil conjectures can offer – make a crucial appearance, when combined with the factorisation of the smooth modulus {q} (this new argument is arguably the most original and interesting contribution of Zhang).

— 1. A three-dimensional exponential sum —

The power savings in Zhang’s Type III argument come from good estimates on the three-dimensional exponential sum

\displaystyle  T(k; m,m'; q) := \sum_{l \in {\bf Z}/q{\bf Z}: (l,q)=(l+k,q)=1} \sum_{t \in ({\bf Z}/q{\bf Z})^\times} \sum_{t' \in ({\bf Z}/q{\bf Z})^\times} \ \ \ \ \ (5)

\displaystyle  e_q( \frac{t}{l} - \frac{t'}{l+k} + \frac{m}{t} - \frac{m'}{t'} )

defined for positive integer {q} and {k,m,m' \in {\bf Z}/q{\bf Z}} (or {k,m,m' \in {\bf Z}}). The key estimate is

Theorem 4 (Bombieri-Birch bound) Let {q} be square-free. Then for any {k,m,m' \in {\bf Z}/q{\bf Z}} we have

\displaystyle  |T(k; m,m';q)| \ll \frac{(m-m',k,q)}{(k,q)^{1/2}} q^{3/2+o(1)}

where {(m-m',k,q)} is the greatest common divisor of {m-m', k, q} (and we adopt the convention that {(0,q)=q}). (Here, the {o(1)} denotes a quantity that goes to zero as {q \rightarrow \infty}, rather than as {x \rightarrow \infty}.)

Note that the square root cancellation heuristic predicts {q^{3/2}} as the size for {T(k;m,m',q)}, thus we can achieve better than square root cancellation if {k} has a common factor with {q} that is not shared with {m-m'}. This improvement over the square root heuristic, which is ultimately due to the presence of a Ramanujan sum inside this three-dimensional exponential sum in certain degenerate cases, is crucial to Zhang’s argument.

Proof: Suppose that {q} factors as {q=q_1q_2}, thus {q_1,q_2} are coprime. Then we have

\displaystyle  e_q(a) = e_{q_1}( \frac{a}{q_2} ) e_{q_2} (\frac{a}{q_1})

(see Lemma 7 of this previous post). From this and the Chinese remainder theorem we see that {T(k;m,m';q)} factorises as

\displaystyle  \prod_{i=1}^2 \sum_{l \in {\bf Z}/q_i{\bf Z}: (l,q_i)=(l+k,q_i)=1} \sum_{t,t' \in ({\bf Z}/q_i{\bf Z})^\times} e_{q_i}( \frac{t}{q_jl} - \frac{t'}{q_j(l+k)} + \frac{m}{q_jt} - \frac{m'}{q_jt'} )

where {j := 3-i}. Dilating {t,t'} by {q_j}, we conclude the multiplicative law

\displaystyle  T(k;m,m';q_1q_2) = T(k;\frac{m}{q_2^2},\frac{m'}{q_2^2};q_1) T(k;\frac{m}{q_1^2},\frac{m'}{q_1^2};q_2).

Iterating this law, we see that to prove Theorem 4 it suffices to do so in the case when {q} is prime, or more precisely that

\displaystyle  |T(k; m,m';p)| \ll \frac{(m-m',k,p)}{(k,p)^{1/2}} p^{3/2}.

We first consider the case when {k = 0\ (p)}, so our objective is now to show that

\displaystyle  |T(0;m,m';p)| \ll (m-m',p) p. \ \ \ \ \ (6)

In this case we can write {T(0;m,m';p)} as

\displaystyle  \sum_{l,t,t' \in ({\bf Z}/p{\bf Z})^\times} e_p( \frac{t}{l} - \frac{t'}{l} + \frac{m}{t} - \frac{m'}{t'} ).

Making the change of variables {s := \frac{tt'}{l}\ (p)}, {u := \frac{1}{t}\ (p)}, {u' := \frac{1}{t'}\ (p)} this becomes

\displaystyle  \sum_{s,u,u' \in ({\bf Z}/p{\bf Z})^\times} e_p( su' - su + mu - m' u' ).

Performing the {u,u'} sums this becomes

\displaystyle  \sum_{s \in ({\bf Z}/p{\bf Z})^\times} C_p(m-s) C_p(s-m')

where {C_q(a)} is the Ramanujan sum

\displaystyle  C_q(a) := \sum_{b \in ({\bf Z}/q{\bf Z})^\times} e_q(ab).

Basic Fourier analysis tells us that {C_p(a)} equals {-1} when {a \neq 0\ (p)} and {0} when {a = 0\ (p)}. The expression (6) then follows from direct computation.

Next, suppose that {k \neq 0\ (p)} and {m' = 0\ (p)}. Making the change of variables {s := -\frac{t'}{l+k}}, {T(k;m,0;p)} becomes

\displaystyle  \sum_{l \in {\bf Z}/p{\bf Z}: (l,p)=(l+k,p)=1} \sum_{t \in ({\bf Z}/p{\bf Z})^\times} \sum_{s \in ({\bf Z}/p{\bf Z})^\times} e_p( \frac{t}{l} + s + \frac{m}{t} ).

Performing the {s} summation, this becomes

\displaystyle  - \sum_{l \in {\bf Z}/p{\bf Z}: (l,p)=(l+k,p)=1} \sum_{t \in ({\bf Z}/p{\bf Z})^\times} e_p( \frac{t}{l} + \frac{m}{t} ).

For each {l}, the {t} summation is a Kloosterman sum and is thus {O(p^{1/2})} by the classical Weil bound (Theorem 8 from previous notes). This gives a net estimate of {O(p^{3/2})} as desired. Similarly if {m = 0\ (p)}.

The only remaining case is when {k,m,m' \neq 0\ (p)}. Here one cannot proceed purely through Ramanujan and Weil bounds, and we need to invoke the deep result of Bombieri and Birch, proven in Theorem 1 of the the appendix to this paper of Friedlander and Iwaniec. This bound can be proven by applying Deligne’s proof of the Weil conjectures to a certain {L}-function attached to the surface {\{ (x_1,x_2,x_3,x_4): \frac{1}{x_1x_2} + \frac{1}{x_3x_4} = 1 \}}; an elementary but somewhat lengthy second proof is also given in the above appendix. \Box

To deal with factors such as {(k,q)}, the following simple lemma will be useful.

Lemma 5 For any {q} and any {K \geq 1} we have

\displaystyle  \sum_{1 \leq k \leq K} (k,q) \ll q^{o(1)} K.

in particular

\displaystyle  \sum_{t \in {\bf Z}/q{\bf Z}} (t,q) \ll q^{1+o(1)}.

As in the previous theorem, {o(1)} here denotes a quantity that goes to zero as {q \rightarrow \infty}, rather than as {x \rightarrow \infty}.

Note that it is important that the {k=0} term is excluded from the first sum, otherwise one acquires an additional {q} term. In particular,

\displaystyle  \sum_{|k| \leq K} (k,q) \ll q + q^{o(1)} K.

Proof: Estimating

\displaystyle  (k,q) \leq \sum_{d|q; d|k} d

we can bound

\displaystyle  \sum_{1 \leq k \leq K}(k,q) \leq \sum_{d|q} \sum_{1 \leq k \leq K: d|k} d

\displaystyle \leq \sum_{d|q} \frac{K}{d} d

\displaystyle  = K \tau(q)

\displaystyle  \ll q^{o(1)} K.

\Box

— 2. Cauchy-Schwarz —

We now prove Theorem 3. The reader may wish to track the exponents involved in the model regime

\displaystyle  \delta \approx 0; \quad N_1=N_2=N_3 = N; \quad N \ll d \ll N^{13/8} \ \ \ \ \ (7)

where {N} is any fixed power of {x} (e.g. {N = x^{5/16}}, in which case {d} can be slightly larger than {x^{1/2}}).

Let {\delta,N_1,N_2,N_3,q,\psi_1,\psi_2,\psi_3,a} be as in Theorem 3, and let {\epsilon>0} be a sufficiently small fixed quantity. It will suffice to show that

\displaystyle  \sum_{n = a\ (d)} \psi_1 \ast \psi_2 \ast \psi_3(n) = X + O( x^{-\epsilon} \frac{N_1 N_2 N_3}{d} )

where {X} does not depend on {a}. We rewrite the left-hand side as

\displaystyle  \sum_{n_1} \psi_1(n_1) \sum_{n: (n,q)=1; n_1 = \frac{a}{n}\ (d)} \psi_2 \ast \psi_3(n)

and then apply completion of sums (Lemma 6 from this previous post) to rewrite this expression as the sum of the main term

\displaystyle  \frac{1}{d} (\sum_{n_1} \psi_1(n_1)) (\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n))

plus the error terms

\displaystyle  O( (\log^{O(1)} x) \frac{N_1}{d} \sum_{1 \leq h \le H} |\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n) e_d( \frac{ah}{n} )| )

and

\displaystyle  O( x^{-A} \sum_n |\psi_2 \ast \psi_3(n)| ).

where {A > 0} is any fixed quantity and

\displaystyle  H := x^\epsilon \frac{d}{N_1}.

The first term does not depend on {a}, and the third term is clearly acceptable, so it suffices to show that

\displaystyle  \sum_{1 \leq h \le H} |\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n) e_d( \frac{ah}{n} ) | \ll x^{-\epsilon} N_2 N_3. \ \ \ \ \ (8)

It will be convenient to reduce to the case when {h} and {d} are coprime. More precisely, it will suffice to prove the following claim:

Proposition 6 Let {\delta>0} be fixed, and let

\displaystyle  H, N_2, N_3, d, B \gg 1 \ \ \ \ \ (9)

be such that

\displaystyle  d \in {\mathcal S}_{[1,x^\delta]}

and

\displaystyle  H \ll x^{\epsilon} \frac{d}{N_2} \ \ \ \ \ (10)

and

\displaystyle  N_2^4 N_3^5 \gg B^{-6} d^4 H^4 x^{\delta+c} \ \ \ \ \ (11)

for some fixed {c>0}, and let {\psi_2,\psi_3} be smooth coefficient sequences at scale {N_2,N_3} respectively. Then

\displaystyle  \sum_{1 \leq h \le H: (h,d)=1} |\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n) e_d( \frac{ah}{n} ) | \ll x^{-\epsilon} B N_2 N_3

for some fixed {\epsilon>0}.

Let us now see why the above proposition implies (8). To prove (8), we may of course assume {H \geq 1} as the claim is trivial otherwise. We can split

\displaystyle  \sum_{1 \leq h \leq H} F(h) = \sum_{d = d_1 d_2} \sum_{1 \leq h' \leq H/d_2: (h',d_1)=1} F( d_2 h )

for any function {F(h)} of {h}, so that (8) can be written as

\displaystyle  \sum_{d = d_1 d_2} \sum_{1 \leq h' \leq H/d_2: (h,d_1)=1} |\sum_{n: (n,d_1 d_2)=1} \psi_2 \ast \psi_3(n) e_{d_1}( \frac{ah'}{n} )|

which we expand as

\displaystyle  \sum_{d = d_1 d_2} \sum_{1 \leq h' \leq H/d_2: (h,d_1)=1} |\sum_{n_2: (n_2,d_1 d_2)=1} \sum_{n_3: (n_3,d_1d_2)=1} \psi_2(n_2) \psi_3(n_3) e_{d_1}( \frac{ah'}{n_2 n_3} )|

In order to apply Proposition (6) we need to modify the {(n_2,d_1d_2)=1}, {(n_3,d_1d_2)=1} constraints. By Möbius inversion one has

\displaystyle  \sum_{n_2: (n_2,d_1d_2)=1} F(n_2) = \sum_{b_2|d_2} \mu(b_2) \sum_{n_2: (n_2,d_1)=1} F(b_2 n_2)

for any function {F}, and similarly for {n_3}, so by the triangle inequality we may bound the previous expression by

\displaystyle  \sum_{d = d_1 d_2} \sum_{b_2|d_2} \sum_{b_3|d_3} F( d_1, d_2, b_1, b_2 ) \ \ \ \ \ (12)

where

\displaystyle  F(d_1,d_2,b_1,b_2) := \sum_{1 \leq h' \leq H/d_2: (h,d_1)=1}

\displaystyle |\sum_{n_2: (n_2,d_1)=1} \sum_{n_3: (n_3,d_1)=1} \psi_2(b_2n_2) \psi_3(b_3n_3)

\displaystyle  e_{d_1}( \frac{ah'}{b_2b_3 n_2 n_3} )|

We may discard those values of {d_2} for which {H' := H/d_2} is less than one, as the summation is vacuous in that case. We then apply Proposition (6) with {d,N_2,N_3,H} replaced by {d_1,N_2/b_2,N_3/b_3,H'} respectively and {B} set equal to {b_2 b_3}, and {\psi_2,\psi_3} replaced by {\psi_2(b_2\cdot)} and {\psi_3(b_3\cdot)}. One can check that all the hypotheses of Proposition 6 are obeyed, so we may bound (12) by

\displaystyle  \ll x^{-\epsilon} N_2 N_3 \sum_{d = d_1 d_2} \sum_{b_2|d_2} \sum_{b_3|d_3} 1

which by the divisor bound is {\ll x^{-\epsilon+o(1)} N_2 N_3}, which is acceptable (after shrinking {\epsilon} slightly).

It remains to prove Proposition 6. Continuing (7), the reader may wish to keep in mind the model case

\displaystyle  \delta \approx 0; N_2 = N_3 = N; \quad N \ll d \ll N^{13/8}; \quad H \approx d/N; \quad B \approx 1.

Note from (9), (10) one has

\displaystyle  d \gg x^{-\epsilon} N_2. \ \ \ \ \ (13)

Expanding out the {\psi_2 \ast \psi_3} convolution, our task is to show that

\displaystyle  \sum_{1 \leq h \le H: (h,d)=1} |\sum_{n_2: (n_2,d)=1} \sum_{n_3: (n_3,d)=1} \psi_2(n_2) \psi_3(n_3) e_d( \frac{ah}{n_2n_3} )| \ll x^{-\epsilon} B N_2 N_3. \ \ \ \ \ (14)

As before, our aim is to obtain a power savings better than {H} over the trivial bound of {H N_2 N_3}.

The next step is Weyl differencing. We will need a step size {r \geq 1} which we will optimise in later. We set

\displaystyle  K := \lfloor x^{-\epsilon} N_2 r^{-1} H^{-1}\rfloor; \ \ \ \ \ (15)

we will make the hypothesis that

\displaystyle  K \geq 1 \ \ \ \ \ (16)

and save this condition to be verified later.

By shifting {n_2} by {khr} for {1 \leq k \leq K} and then averaging, we may write the left-hand side of (14) as

\displaystyle  \sum_{1 \leq h \le H: (h,d)=1} |\frac{1}{K} \sum_{1 \leq k \leq K} \sum_{n_2: (n_2,d)=1} \sum_{n_3: (n_3,d)=1}

\displaystyle  \psi_2(n_2+hkr) \psi_3(n_3) e_d( \frac{ah}{(n_2+hkr)n_3} )|.

By the triangle inequality, it thus suffices to show that

\displaystyle  \sum_{1 \leq h \leq H: (h,d)=1} \sum_{n_2: (n_2,d)=1} |\sum_{1 \leq k \leq K} \psi_2(n_2+hkr) \ \ \ \ \ (17)

\displaystyle  \sum_{n_3: (n_3,d)=1} \psi_3(n_3) e_d( \frac{ah}{(n_2+hkr)n_3} )| \ll x^{-\epsilon} B K N_2 N_3.

Next, we combine the {h} and {n_2} summations into a single summation over {{\bf Z}/d{\bf Z}}. We first use a Taylor expansion and (15) to write

\displaystyle  \psi_2(n_2+hkr) = \sum_{j=0}^J \frac{1}{j!} (h/H)^j N_2^{j} \psi_2^{(j)}(n_2) (Hkr/N_2)^j + O( x^{-J\epsilon+o(1)})

for any fixed {J}. If {J} is large enough, then the error term will be acceptable, so it suffices to establish (17) with {\psi_2(n_2+hkr)} replaced by {(h/H)^j N_2^j \psi_2^{(j)}(n_2) (hkr/N_2)^j} for any fixed {j \geq 0}. We can rewrite

\displaystyle  e_d( \frac{ah}{(n_2+hkr)n_3} ) = e_d( \frac{a}{(l+kr) n_3} )

where {l \in {\bf Z}/d{\bf Z}} is such that {(l+kr,d)=1} and

\displaystyle  l = \frac{n_2}{h}\ (d).

Thus we can estimate the left-hand side of (17) by

\displaystyle  \sum_{l \in {\bf Z}/d{\bf Z}} \nu(l) |\sum_{1 \leq k \leq K: (l+kr,d)=1} (Hkr/N_2)^j \ \ \ \ \ (18)

\displaystyle \sum_{n_3: (n_3,d)=1} \psi_3(n_3) e_d( \frac{a}{(l+kr) n_3})|

where

\displaystyle  \nu(l) := \sum_{1 \leq h \leq H: (h,d)=1} \sum_{n_2} 1_{l = \frac{n_2}{h}\ (d)} N_2^j |\psi_2^{(j)}(n_2)|.

Here we have bounded {(h/H)^j} by {O(1)}.

We will eliminate the {\nu} expression via Cauchy-Schwarz. Observe from the smoothness of {\psi_2} that

\displaystyle  \nu(l) \ll x^{o(1)} |\{ (h,n_2): 1 \leq h \leq H; 1 \ll n_2 \ll N_2; (h,d)=1; l = \frac{n_2}{h}\ (d) \}|

and thus

\displaystyle  \sum_l \nu(l)^2 \ll x^{o(1)} |\{ (h,h',n_2,n'_2): 1 \leq h,h' \leq H; 1\ll n_2,n'_2 \ll N_2;

\displaystyle  (h,d)=(h',d) = 1; \frac{n_2}{h} = \frac{n'_2}{h'}\ (d) \}|.

Note that {\frac{n_2}{h} = \frac{n'_2}{h'}\ (d)} implies {n_2 h' = n'_2 h\ (d)}. But from (10) we have {1 \leq n_2 h', n'_2 h \leq d}, so in fact we have {n_2 h' = n'_2 h}. Thus

\displaystyle  \sum_l \nu(l)^2 \ll x^{o(1)} |\{ (h,h',n_2,n'_2): 1 \leq h' \leq H; 1\ll n_2 \ll N_2; n_2 h' = n'_2 h \}|.

From the divisor bound, we see that for each fixed {n_2, h'} there are {O(x^{o(1)})} choices for {n'_2,h}, thus

\displaystyle  \sum_l \nu(l)^2 \ll x^{o(1)} N_2 H.

From this, (18), and Cauchy-Schwarz, we see that to prove (17) it will suffice to show that

\displaystyle  \sum_{l \in {\bf Z}/d{\bf Z}} |\sum_{1 \leq k \leq K: (l+kr,d)=1} (Hkr/N_2)^j \ \ \ \ \ (19)

\displaystyle  \sum_{n_3: (n_3,d)=1} \psi_3(n_3) e_d( \frac{a}{(l+kr) n_3})|^2

\displaystyle  \ll x^{-2\epsilon} B^{2} K^2 N_2 N_3^2 H^{-1}.

Comparing with the trivial bound of {O( d N_3^2 K^2 )}, our task is now to gain a factor of more than {\frac{B^2Hd}{N_2}} over the trivial bound.

We square out (19) as

\displaystyle  \sum_{1 \leq k,k' \leq K}\sum_{l \in {\bf Z}/d{\bf Z}: (l+kr,d)=(l+k'r,d)=1} (Hkr/N_2)^j (Hk'r/N_2)^j

\displaystyle  \sum_{n_3,n'_3: (n_3,d)=(n'_3,d)=1} \psi_3(n_3) \overline{\psi_3}(n_3) e_d( \frac{a}{(l+kr)n_3)} - \frac{a}{(l+k'r)n'_3} ).

If we shift {l} by {kr}, then relabel {k'-k} by {k}, and use the fact that {Hkr/N_2, Hk'r/N_2 = O(1)}, we can reduce this to

\displaystyle  \sum_{|k| \leq K}

\displaystyle  |\sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+kr,d)=1} \sum_{n_3,n'_3: (n_3,d)=(n'_3,d)=1}

\displaystyle  \psi_3(n_3) \overline{\psi_3}(n_3) e_d( \frac{a}{ln_3} - \frac{a}{(l+kr)n'_3} )|

\displaystyle  \ll x^{-2\epsilon} B^{2} K N_2 N_3^2 H^{-1}.

Next we perform another completion of sums, this time in the {n_3,n'_3} variables, to bound

\displaystyle  |\sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+kr,d)=1} \sum_{n_3,n'_3: (n_3,d)=(n'_3,d)=1}

\displaystyle  \psi_3(n_3) \overline{\psi_3}(n_3) e_d( \frac{a}{ln_3} - \frac{a}{(l+kr)n'_3} )|

by

\displaystyle  \ll x^{o(1)} \sum_{|m|, |m'| \leq M'} (\frac{N_3}{d})^2 | U(k; m,m'; d)|+ x^{-A}

for any fixed {A>0}, where

\displaystyle  M' := x^{\epsilon} \frac{d}{N_3} \ \ \ \ \ (20)

(the prime is there to distinguish this quantity from {M} in the introduction) and

\displaystyle  U(k;m,m';d) := \sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+kr,d)=1} \sum_{n_3,n'_3 \in ({\bf Z}/d{\bf Z})^\times}

\displaystyle  e_d( \frac{a}{ln_3} - \frac{a}{(l+kr)n'_3} + mn_3 - m' n'_3).

Making the change of variables {t := \frac{a}{n_3}\ (d)} and {t' := \frac{a}{n'_3}\ (d)} and comparing with(5), we see that

\displaystyle  U(k;m,m';d) = T( kr; am, am'; d).

Applying Theorem 4 (and recalling that {a \in ({\bf Z}/d{\bf Z})^\times}) we reduce to showing that

\displaystyle  \sum_{|k| \leq K} \sum_{|m|, |m'| \leq M'} \frac{(kr,m-m',d)}{(kr,d)^{1/2}} (\frac{N_3}{d})^2 d^{3/2} \ll x^{-3\epsilon} B^{2} K N_2 N_3^2 H^{-1}.

We now choose {r} to be a factor of {d}, thus

\displaystyle  d = qr

for some {q} coprime to {r}. We compute the sum on the left-hand side:

Lemma 7 We have

\displaystyle  \sum_{|k| \leq K} \sum_{|m|, |m'| \leq M'} \frac{(kr,m-m',d)}{(kr,d)^{1/2}}

\displaystyle  \ll x^{o(1)} ( M' r^{1/2} K + M' d^{1/2} + (M')^2 K r^{-1/2} ).

Proof: We first consider the contribution of the diagonal case {m=m'}. This term may be estimated by

\displaystyle  \ll M' \sum_{|k| \leq K} (kr,d)^{1/2} = M' r^{1/2} \sum_{|k| \leq K} (k,q)^{1/2}.

The {k=0} term gives {M'd^{1/2}}, while the contribution of the non-zero {k} are acceptable by Lemma 5.

For the non-diagonal case {m \neq m'}, we see from Lemma 5 that

\displaystyle  \sum_{|m|,|m'| \leq M': m \neq m'} (kr,m-m',d) \ll x^{o(1)} (M')^2;

since {(kr,d) \geq r}, we obtain a bound of {O( x^{o(1)} (M')^2 K r^{-1/2} )} from this case as required. \Box

From this lemma, we see that we are done if we can find {r} obeying

\displaystyle  (M' r^{1/2} K + M' d^{1/2} + (M')^2 K r^{-1/2} ) (\frac{N_3}{d})^2 d^{3/2} \ll x^{-4\epsilon} B^{2} K N_2 N_3^2 H^{-1}. \ \ \ \ \ (21)

as well as the previously recorded condition (16). We can split the condition (21) into three subconditions:

\displaystyle  M' r^{1/2} d^{-1/2} \ll x^{-4\epsilon} B^{2} N_2 H^{-1}

\displaystyle  M' K^{-1} \ll x^{-4\epsilon} B^{2} N_2 H^{-1}

\displaystyle  (M')^2 r^{-1/2} d^{-1/2} \ll x^{-4\epsilon} B^{2} N_2 H^{-1}.

Substituting the definitions (15), (20) of {K, M'}, we can rewrite all of these conditions as lower and upper bounds on {r}. Indeed, (16) follows from (say)

\displaystyle  r \ll n^{-2\epsilon} N_2 H^{-1} \ \ \ \ \ (22)

while the other three conditions rearrange to

\displaystyle  r \ll x^{-10\epsilon} B^{4} N_2^2 N_3^2 H^{-2} d^{-1} \ \ \ \ \ (23)

\displaystyle  r \ll x^{-6\epsilon} B^{2} N_2^2 N_3 H^{-2} d^{-1} \ \ \ \ \ (24)

and

\displaystyle  r \gg x^{12\epsilon} B^{-4} N_2^{-2} N_3^{-4} H^2 d^{3}.

We can combine (23), (24) into a single condition

\displaystyle  r \ll x^{-10\epsilon} B^{2} N_2^2 N_3 H^{-2} d^{-1}.

Also, from (9), (13) we see that this new condition also implies (22). Thus we are done as soon as we find a factor {r} of {d} such that

\displaystyle  R_1 \ll r \ll R_2

where

\displaystyle  R_1 := x^{12\epsilon} B^{2} N_2^{-2} N_3^{-4} H^2 d^{3}

and

\displaystyle  R_2 := x^{-6\epsilon} B^{-4} N_2^2 N_3 H^{-2} d^{-1}.

From (11) one has

\displaystyle  R_2/R_1 \gg x^\delta

if {\epsilon} is sufficiently small. Also, from (11), (9) one also sees that

\displaystyle  R_1 \ll d

and {R_2 \gg 1}. As {d} is {x^\delta}-smooth, we can thus find {r} with the desired properties by the greedy algorithm. (In view of Corollary 12 from this previous post, one could also have ensured that {q} has no tiny factors, although this does not seem to be of much actual use in the Type III analysis.)


Filed under: math.NT, polymath Tagged: Cauchy-Schwarz, completion of sums, polymath8, Ramanujan sum, Weil conjectures, Yitang Zhang

Matt StrasslerScience Past and Future, on Diverse Continents

Today, two articles that I found especially interesting and that I recommend to you:

China’s Tianhe-2 retakes fastest supercomputer crownA China-based supercomputer has leapfrogged rivals to be named the world’s most powerful system.

This article caught my eye because I think it highlights the degree to which China is rapidly catching up with Europe, the United States and Japan on certain technologies that matter a great deal.  China, unlike the US, which has been generally cutting its scientific spending since around 2000, is putting a tremendous amount of its money into science and engineering, aiming to surpass the world’s current technology leaders. Though they’re still making their way forward, their efforts are starting to pay off.  Since supercomputers are widely used in developing new technology (e.g., simulating novel aircraft), leadership in supercomputers, should they attain it, will have many benefits for the Chinese economy and military.  Lest you think they are merely copying what others have already done, you should make sure to read the last half of the article. Will it take another Sputnik moment to make anti-scientific politicians properly nervous about the cost of falling behind?

The second article of interest was this one (though the headline is a bit overstated…)

Roman Seawater Concrete Holds the Secret to Cutting Carbon Emissions:  Berkeley Lab scientists and their colleagues have discovered the properties that made ancient Roman concrete sustainable and durable

This great story evokes the tragic romance of knowledge lost for centuries — along the lines of the Stradivarius violins that no violin maker today can match. And it weaves several interesting strands.  First is the fact that modern concrete begins to fall apart in seawater in half a century, while the Romans managed to make a concrete that can survive seawater for two millenia.  How did they do it?  

Well, that’s the second interesting part: researchers claim to have figured it out, using one of the most modern of scientific techniques — flashes of ultraviolet or X-ray light, emitted by high-energy electrons traveling at nearly light-speed, in a particle accelerator (the Advanced Light Source). The Advanced Light Source is located at Lawrence Berkeley Laboratory, in the hills above the university we call “Berkeley” (officially the University of California at Berkeley).

The third interesting thing: the researchers learned that the Romans’ concrete, made mainly from lime (from limestone) and volcanic ash (pulverized rock created in abundance during any energetic volcanic eruption), used less lime and was formed at much lower temperatures than modern concrete. If modern concrete were replaced (when appropriate and possible) with a similar material, its production would use much less energy. And since concrete production is a notable contributor to overall energy use, this is not a minor effect.  In short, it’s just possible that this could be one of those rare situations where everyone wins: either the Roman concrete, or, more likely, a modern/ancient hybrid, may turn out to be more durable, more fuel-efficient to produce, and perhaps cheaper than the forms of concrete we use today.  

Thank goodness! The US government is still funding some important research!  Oh.  Right.  I guess it should be mentioned that initial funding for this work came from King Abdullah University of Science and Technology in Saudi Arabia.  Apparently they have a lot of volcanic ash lying about…


Filed under: Particle Physics, Science and Modern Society Tagged: China, computers, SynchrotronLight

June 18, 2013

Clifford JohnsonEffective

I just learned* that Ken Wilson died a few days ago (Just 15th). Wilson is another of the giants that you don't hear much about in the popular media coverage of the great ideas in Physics that form the bedrock of so much of what we do. You still get people saying utter nonsense about "hiding infinities" in physics and so forth (often in discussions on blogs and various similar forums (fora?)) because what he taught us all about effective field theory and the renormalization group still is only taught in some advanced classes on quantum field theory (and still not as well or frequently as it should be in such classes ... it has only relatively recently begun to be put at the forefront in textbooks on the subject, such as Tony Zee's). In the cut and thrust of the mainstream of research though, I'm happy to see that so much of Wilson's legacy is in the most basic fabric of the language we use to discuss results and ideas in particle physics, condensed matter physics, quantum gravity, string theory, and so forth. I had the distinct privilege of having Joe Polchinski as a mentor for some of my postdoc years, who is known as being one of the current giants on the scene who [...]

Victor RivellesThe non-linear evolution of a bus fare rise

Yesterday was an atypical day in Brazil. It all started on June 1st with a rise in the bus fare in almost all major cities. The fare rose about 7%, the inflation since the last fare rise.  On June 6 some people in São Paulo organized the first rally against the fare rise and called it PASSE LIVRE, free fare. Not many people showed up. They walked along  some main streets causing troubles for the local traffic. The police tried to repress the demonstration and they came into conflict resulting in violence from both sides. The next day some more people showed up and the same violence happened again. Other cities in Brazil started doing the same. On June 13 the fourth demonstration in São Paulo was too violent. The riot police was blamed and accused of violence, vandalism and intolerance. It was then agreed that the police would not interfere with the fifth demonstration which happened yesterday.

So in a plain Monday, a regular working day, it happened. More than 65.000 people were protesting not only against the rise in the bus fare, but also against the services provided by the state. Poor education system, bad health services, lack of security. Others were protesting against corruption, mainly governmental corruption which seems endless. And there were even people complaining about the expenses with the next FIFA World Cup and Olympics which will happen in a few years time in Brazil. The demonstration started at 5 pm and went on during the night luckily without any violence. But that is not all. Demonstrations happened in many other cities all over Brazil also asking for the same. Unfortunately some were much more violent than in São Paulo. In Rio de Janeiro the city assembly was attacked while in Brasília people got to the roof of the congress building. Altogether about A QUARTER OF A MILLION people were in the streets yesterday! Brazilians are awakening at last and asking for their rights!

If you google PASSE LIVRE or go to YouTube you can see what happened in several cities. But there is video which explains it all. It seems it was recorded before yesterday´s demonstration.


Scott AaronsonThe Ghost in the Quantum Turing Machine

I’ve been traveling this past week (in Israel and the French Riviera), heavily distracted by real life from my blogging career.  But by popular request, let me now provide a link to my very first post-tenure publication: The Ghost in the Quantum Turing Machine.

Here’s the abstract:

In honor of Alan Turing’s hundredth birthday, I unwisely set out some thoughts about one of Turing’s obsessions throughout his life, the question of physics and free will. I focus relatively narrowly on a notion that I call “Knightian freedom”: a certain kind of in-principle physical unpredictability that goes beyond probabilistic unpredictability. Other, more metaphysical aspects of free will I regard as possibly outside the scope of science. I examine a viewpoint, suggested independently by Carl Hoefer, Cristi Stoica, and even Turing himself, that tries to find scope for “freedom” in the universe’s boundary conditions rather than in the dynamical laws. Taking this viewpoint seriously leads to many interesting conceptual problems. I investigate how far one can go toward solving those problems, and along the way, encounter (among other things) the No-Cloning Theorem, the measurement problem, decoherence, chaos, the arrow of time, the holographic principle, Newcomb’s paradox, Boltzmann brains, algorithmic information theory, and the Common Prior Assumption. I also compare the viewpoint explored here to the more radical speculations of Roger Penrose. The result of all this is an unusual perspective on time, quantum mechanics, and causation, of which I myself remain skeptical, but which has several appealing features. Among other things, it suggests interesting empirical questions in neuroscience, physics, and cosmology; and takes a millennia-old philosophical debate into some underexplored territory.

See here (and also here) for interesting discussions over on Less Wrong.  I welcome further discussion in the comments section of this post, and will jump in myself after a few days to address questions (update: eh, already have).  There are three reasons for the self-imposed delay: first, general busyness.  Second, inspired by the McGeoch affair, I’m trying out a new experiment, in which I strive not to be on such an emotional hair-trigger about the comments people leave on my blog.  And third, based on past experience, I anticipate comments like the following:

“Hey Scott, I didn’t have time to read this 85-page essay that you labored over for two years.  So, can you please just summarize your argument in the space of a blog comment?  Also, based on the other comments here, I have an objection that I’m sure never occurred to you.  Oh, wait, just now scanning the table of contents…”

So, I decided to leave some time for people to RTFM (Read The Free-Will Manuscript) before I entered the fray.

For now, just one remark: some people might wonder whether this essay marks a new “research direction” for me.  While it’s difficult to predict the future (even probabilistically :-) ), I can say that my own motivations were exactly the opposite: I wanted to set out my thoughts about various mammoth philosophical issues once and for all, so that then I could get back to complexity, quantum computing, and just general complaining about the state of the world.

Geraint F. LewisThe Arbitrarily Large Monty Hall Problem

I've just fallen off a plane from the UK, and am a little tired, but here's some interesting mathematics that kept me busy `on the road'.

I've loved the Monty Hall problem since I first heard about it. It is not a difficult problem, but the outcome can seem quite counter intuitive. Before I look at the problem in more detail (more detail than I should?) you should have a look at this little video as a refresher.


So, let's try and represent all we've seen in the movie in this as a picture. 


At the top is the initial situation, with the car, C, and two goats, G, behind the doors. The next level down is what we end up with, either with you choosing the car and one goat door open (with a probability of 1/3), or choosing a goat and a goat door revealed (with a probability of 2/3).

Looking at this , it's clear that if you choose to stick with your original choice of doors, you will win the car with a probability of 1/3 and will win the goat with a probability of 2/3.

However, if you swap doors, the probability of getting a goat is now 1/3 and winning the car is 2/3; the chances of winning a car has increased by a factor of two by simply switching doors.

How cool is that!

But I started thinking, if we change the problem, change the number of doors, cars and goats, but always reveal one door with a goat, will it always boost your chances to swap?

OK, here we go. Now we have 4 doors to start with, but one car and now three goats. So, similar to the picture above, we get to the final state, with you either choosing the car with the first guess, with the probability of 1/4, or a goat with the probability 3/4.



So, looking at this, then of you stick with the original choice, the chance of winning the car is 1/4 and winning a goat is 3/4. But what if you choose to swap?

Clearly, if you chose the car to start with, then when you swap you will get a goat. 

If, however, you chose the goat to start with, then if you swap then you could swap to the car or you could swap to a goat. At this stage, if you swap you have a 50-50 chance of getting the goat or car. What the chance of winning the car now? 


So, the chance of winning the car if you don't swap is 1/4 (or 2/8), but if you swap it is 3/8, so there is an improved chance of 1.5 times in winning the car if swapping doors than keeping your original choice. 

OK, let's spice things up a little. Let's now have two cars and two goats. Same picture as before, but now the initial chance of choosing a car or a goat is 1/2. 


So, if you stick with your original choices, you have a 50-50 chance of winning a car. But what if you swap?

If you originally picked a car, then there is a 50-50 chance that you will swap to a car or a goat, so you can win or lose. If you originally chose a goat, then when swapping you have a 100% chance of getting a car.

So, what's the chance of winning when swapping now?


So, again, the chances of you winning are boosted by a factor of 1.5 times by swapping rather than keeping your original choice.

So, what if you have originally n cars and m goats, hidden behind (n+m) doors. What's the ratio winning when swapping compared to keeping your original choice? I will leave the algebra to the reader, but you can show this ratio is


Woooooh! This result does not depend upon the actual number of cars and goats, only the number of doors (I should point out, there has to be at least one car and two goats for this to work).

Let's just check this works. In the original problem, there are three doors, so ndoor=3, and this ratio is 2. Excellent. What about four doors, with ndoor=4? The ratio becomes 1.5, just as we saw before.

I don't know if this has been derived before, but I think it is a cool result. 

It tells you a few things. Firstly, as we increase the number of doors, the ratio between swapping and sticking approaches unity, irrespective of the number of cars and goats. But more importantly, the limit is approached from above; the numerator of the fraction is always bigger than the denominator, and so the ratio is always greater than one. It might be only a little bit bigger than one, but it always is.


The moral of the story is to always swap, no matter how many doors are presented to you. Good luck!

(note - badly formed maths fixed since original post - kids, don't blog while jet-lagged!)

David HoggexoSAMSI day six, likelihood function

We had the big battle about the form for a likelihood for exoplanet search and characterization late in the day, with Baines (Davis) leading the discussion. There was a huge disagreement about the realism of the model—as there always is—with some saying we should split stellar and spacecraft contributions to the lightcurve variability, and some wanting to mush them together. The former is the Right Thing To Do (tm) when you are going hierarchical, because it permits you to pool data from multiple stars on the same part of the detector (for the instrument model) and multiple stars of the same type scattered around in time and space (for the stellar model). That said, the mush-together option might be the right thing to do for an effective model where you want or need to treat every lightcurve separately. We chose the mush, and Baines and the gathered worked out some ideas about what kinds of models for stellar variability and spacecraft artifacts might work.

In the rest of the day, the team worked on selection of target stars, injection of false signals, running standard filter-based and fitting-based de-trenders and co-trenders, investigation of wavelet transforms, and statistical properties of stellar variations.

David HoggexoSAMSI day seven, injection and recovery

While Baines (Davis) and Dawson (CfA) worked on the statistics of the amplitudes of wavelet-tranform elements (to build intuition for our likelihood function), the rest of the team, including especially Montet (Caltech), Barclay (Ames), and Foreman-Mackey worked on doing approximate one-dimensional (lightcurve-level) injections of Earth-like planets on year-like orbital periods into a set of Solar-type stars chosen for us by Matijevic (Villanova). At time of writing (day isn't over), team is trying to close the loop of running an approximate search ("box least squares" or equivalent) on the injected lightcurves for exoplanet recovery. If we can close this loop, we feel like we might be able to achieve our goals, which include (but are not limited to) finding Earth-like planets around Sun-like stars on year-like orbits. Sound familiar? I worked on missing-data imputation and wavelet transforms.

Terence TaoThe elementary Selberg sieve and bounded prime gaps

This post is a continuation of the previous post on sieve theory, which is an ongoing part of the Polymath8 project to improve the various parameters in Zhang’s proof that bounded gaps between primes occur infinitely often. Given that the comments on that page are getting quite lengthy, this is also a good opportunity to “roll over” that thread.

We will continue the notation from the previous post, including the concept of an admissible tuple, the use of an asymptotic parameter {x} going to infinity, and a quantity {w} depending on {x} that goes to infinity sufficiently slowly with {x}, and {W := \prod_{p<w} p} (the {W}-trick).

The objective of this portion of the Polymath8 project is to make as efficient as possible the connection between two types of results, which we call {DHL[k_0,2]} and {MPZ[\varpi,\delta]}. Let us first state {DHL[k_0,2]}, which has an integer parameter {k_0 \geq 2}:

Conjecture 1 ({DHL[k_0,2]}) Let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then there are infinitely many translates {n+{\mathcal H}} of {{\mathcal H}} which contain at least two primes.

Zhang was the first to prove a result of this type with {k_0 = 3,500,000}. Since then the value of {k_0} has been lowered substantially; at this time of writing, the current record is {k_0 = 26,024}.

There are two basic ways known currently to attain this conjecture. The first is to use the Elliott-Halberstam conjecture {EH[\theta]} for some {\theta>1/2}:

Conjecture 2 ({EH[\theta]}) One has

\displaystyle  \sum_{1 \leq q \leq x^\theta} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a\ (q)} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)|

\displaystyle = O( \frac{x}{\log^A x} )

for all fixed {A>0}. Here we use the abbreviation {n=a\ (q)} for {n=a \hbox{ mod } q}.

Here of course {\Lambda} is the von Mangoldt function and {\phi} the Euler totient function. It is conjectured that {EH[\theta]} holds for all {0 < \theta < 1}, but this is currently only known for {0 < \theta < 1/2}, an important result known as the Bombieri-Vinogradov theorem.

In a breakthrough paper, Goldston, Yildirim, and Pintz established an implication of the form

\displaystyle  EH[\theta] \implies DHL[k_0,2] \ \ \ \ \ (1)

for any {1/2 < \theta < 1}, where {k_0 = k_0(\theta)} depends on {\theta}. This deduction was very recently optimised by Farkas, Pintz, and Revesz and also independently in the comments to the previous blog post, leading to the following implication:

Theorem 3 (EH implies DHL) Let {1/2 < \theta < 1} be a real number, and let {k_0 \geq 2} be an integer obeying the inequality

\displaystyle  2\theta > \frac{j_{k_0-2}^2}{k_0(k_0-1)}, \ \ \ \ \ (2)

where {j_n} is the first positive zero of the Bessel function {J_n(x)}. Then {EH[\theta]} implies {DHL[k_0,2]}.

Note that the right-hand side of (2) is larger than {1}, but tends asymptotically to {1} as {k_0 \rightarrow \infty}. We give an alternate proof of Theorem 3 below the fold.

Implications of the form Theorem 3 were modified by Motohashi and Pintz, which in our notation replaces {EH[\theta]} by an easier conjecture {MPZ[\varpi,\delta]} for some {0 < \varpi < 1/4} and {0 < \delta < 1/4+\varpi}, at the cost of degrading the sufficient condition (2) slightly. In our notation, this conjecture takes the following form for each choice of parameters {\varpi,\delta}:

Conjecture 4 ({MPZ[\varpi,\delta]}) Let {{\mathcal H}} be a fixed {k_0}-tuple (not necessarily admissible) for some fixed {k_0 \geq 2}, and let {b\ (W)} be a primitive residue class. Then

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sum_{a \in C(q)} |\Delta_{b,W}(\Lambda; q,a)| = O( x \log^{-A} x) \ \ \ \ \ (3)

for any fixed {A>0}, where {I = (w,x^{\delta})}, {{\mathcal S}_I} are the square-free integers whose prime factors lie in {I}, and {\Delta_{b,W}(\Lambda;q,a)} is the quantity

\displaystyle  \Delta_{b,W}(\Lambda;q,a) := | \sum_{x \leq n \leq 2x: n=b\ (W); n = a\ (q)} \Lambda(n) \ \ \ \ \ (4)

\displaystyle  - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x: n = b\ (W)} \Lambda(n)|.

and {C(q)} is the set of congruence classes

\displaystyle  C(q) := \{ a \in ({\bf Z}/q{\bf Z})^\times: P(a) = 0 \}

and {P} is the polynomial

\displaystyle  P(a) := \prod_{h \in {\mathcal H}} (a+h).

This is a weakened version of the Elliott-Halberstam conjecture:

Proposition 5 (EH implies MPZ) Let {0 < \varpi < 1/4} and {0 < \delta < 1/4+\varpi}. Then {EH[1/2+2\varpi+\epsilon]} implies {MPZ[\varpi,\delta]} for any {\epsilon>0}. (In abbreviated form: {EH[1/2+2\varpi+]} implies {MPZ[\varpi,\delta]}.)

In particular, since {EH[\theta]} is conjecturally true for all {0 < \theta < 1/2}, we conjecture {MPZ[\varpi,\delta]} to be true for all {0 < \varpi < 1/4} and {0<\delta<1/4+\varpi}.

Proof: Define

\displaystyle  E(q) := \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{x \leq n \leq 2x: n = a\ (q)} \Lambda(n) - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x} \Lambda(n)|

then the hypothesis {EH[1/2+2\varpi+\epsilon]} (applied to {x} and {2x} and then subtracting) tells us that

\displaystyle  \sum_{1 \leq q \leq Wx^{1/2+2\varpi}} E(q) \ll x \log^{-A} x

for any fixed {A>0}. From the Chinese remainder theorem and the Siegel-Walfisz theorem we have

\displaystyle  \sup_{a \in ({\bf Z}/q{\bf Z})^\times} \Delta_{b,W}(\Lambda;q,a) \ll E(qW) + \frac{1}{\phi(q)} x \log^{-A} x

for any {q} coprime to {W} (and in particular for {q \in {\mathcal S}_I}). Since {|C(q)| \leq k_0^{\Omega(q)}}, where {\Omega(q)} is the number of prime divisors of {q}, we can thus bound the left-hand side of (3) by

\displaystyle  \ll \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} k_0^{\Omega(q)} E(qW) + k_0^{\Omega(q)} \frac{1}{\phi(q)} x \log^{-A} x.

The contribution of the second term is {O(x \log^{-A+O(1)} x)} by standard estimates (see Proposition 8 below). Using the very crude bound

\displaystyle  E(q) \ll \frac{1}{\phi(q)} x \log x

and standard estimates we also have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} k_0^{2\Omega(q)} E(qW) \ll x \log^{O(1)} A

and the claim now follows from the Cauchy-Schwarz inequality. \Box

In practice, the conjecture {MPZ[\varpi,\delta]} is easier to prove than {EH[1/2+2\varpi+]} due to the restriction of the residue classes {a} to {C(q)}, and also the restriction of the modulus {q} to {x^\delta}-smooth numbers. Zhang proved {MPZ[\varpi,\varpi]} for any {0 < \varpi < 1/1168}. More recently, our Polymath8 group has analysed Zhang’s argument (using in part a corrected version of the analysis of a recent preprint of Pintz) to obtain {MPZ[\varpi,\delta]} whenever {\delta, \varpi > 0} are such that

\displaystyle  207\varpi + 43\delta < \frac{1}{4}.

The work of Motohashi and Pintz, and later Zhang, implicitly describe arguments that allow one to deduce {DHL[k_0,2]} from {MPZ[\varpi,\delta]} provided that {k_0} is sufficiently large depending on {\varpi,\delta}. The best implication of this sort that we have been able to verify thus far is the following result, established in the previous post:

Theorem 6 (MPZ implies DHL) Let {0 < \varpi < 1/4}, {0 < \delta < 1/4+\varpi}, and let {k_0 \geq 2} be an integer obeying the constraint

\displaystyle  1+4\varpi > \frac{j_{k_0-2}^2}{k_0(k_0-1)} (1+\kappa) \ \ \ \ \ (5)

where {\kappa} is the quantity

\displaystyle \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{2\delta}} (1 - \frac{2n \delta}{1 + 4\varpi})^{k_0/2} \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ).

Then {MPZ[\varpi,\delta]} implies {DHL[k_0,2]}.

This complicated version of {\kappa} is roughly of size {3 \log(2) k_0 \exp( - k_0 \delta)}. It is unlikely to be optimal; the work of Motohashi-Pintz and Pintz suggests that it can essentially be improved to {\frac{1}{\delta} \exp(-k_0 \delta)}, but currently we are unable to verify this claim. One of the aims of this post is to encourage further discussion as to how to improve the {\kappa} term in results such as Theorem 6.

We remark that as (5) is an open condition, it is unaffected by infinitesimal modifications to {\varpi,\delta}, and so we do not ascribe much importance to such modifications (e.g. replacing {\varpi} by {\varpi-\epsilon} for some arbitrarily small {\epsilon>0}).

The known deductions of {DHL[k_0,2]} from claims such as {EH[\theta]} or {MPZ[\varpi,\delta]} rely on the following elementary observation of Goldston, Pintz, and Yildirim (essentially a weighted pigeonhole principle), which we have placed in “{W}-tricked form”:

Lemma 7 (Criterion for DHL) Let {k_0 \geq 2}. Suppose that for each fixed admissible {k_0}-tuple {{\mathcal H}} and each congruence class {b\ (W)} such that {b+h} is coprime to {W} for all {h \in {\mathcal H}}, one can find a non-negative weight function {\nu: {\bf N} \rightarrow {\bf R}^+}, fixed quantities {\alpha,\beta > 0}, a quantity {A>0}, and a fixed positive power {R} of {x} such that one has the upper bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) \leq (\alpha+o(1)) A\frac{x}{W}, \ \ \ \ \ (6)

the lower bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) \theta(n+h_i) \geq (\beta-o(1)) A\frac{x}{W} \log R \ \ \ \ \ (7)

for all {h_i \in {\mathcal H}}, and the key inequality

\displaystyle  \frac{\log R}{\log x} > \frac{1}{k_0} \frac{\alpha}{\beta} \ \ \ \ \ (8)

holds. Then {DHL[k_0,2]} holds. Here {\theta(n)} is defined to equal {\log n} when {n} is prime and {0} otherwise.

Proof: Consider the quantity

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) (\sum_{h \in {\mathcal H}} \theta(n+h) - \log(3x)). \ \ \ \ \ (9)

By (6), (7), this quantity is at least

\displaystyle  k_0 \beta A\frac{x}{W} \log R - \alpha \log(3x) A\frac{x}{W} - o(A\frac{x}{W} \log x).

By (8), this expression is positive for all sufficiently large {x}. On the other hand, (9) can only be positive if at least one summand is positive, which only can happen when {n+{\mathcal H}} contains at least two primes for some {x \leq n \leq 2x} with {n=b\ (W)}. Letting {x \rightarrow \infty} we obtain {DHL[k_0,2]} as claimed. \Box

In practice, the quantity {R} (referred to as the sieve level) is a power of {x} such as {x^{\theta/2}} or {x^{1/4+\varpi}}, and reflects the strength of the distribution hypothesis {EH[\theta]} or {MPZ[\varpi,\delta]} that is available; the quantity {R} will also be a key parameter in the definition of the sieve weight {\nu}. The factor {A} reflects the order of magnitude of the expected density of {\nu} in the residue class {b\ (W)}; it could be absorbed into the sieve weight {\nu} by dividing that weight by {A}, but it is convenient to not enforce such a normalisation so as not to clutter up the formulae. In practice, {A} will some combination of {\frac{\phi(W)}{W}} and {\log R}.

Once one has decided to rely on Lemma 7, the next main task is to select a good weight {\nu} for which the ratio {\alpha/\beta} is as small as possible (and for which the sieve level {R} is as large as possible. To ensure non-negativity, we use the Selberg sieve

\displaystyle  \nu = \lambda^2, \ \ \ \ \ (10)

where {\lambda(n)} takes the form

\displaystyle  \lambda(n) = \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) a_d

for some weights {a_d \in {\bf R}} vanishing for {d>R} that are to be chosen, where {I \subset (w,+\infty)} is an interval and {P} is the polynomial {P(n) := \prod_{h \in {\mathcal H}} (n+h)}. If the distribution hypothesis is {EH[\theta]}, one takes {R := x^{\theta/2}} and {I := (w,+\infty)}; if the distribution hypothesis is instead {MPZ[\varpi,\delta]}, one takes {R := x^{1/4+\varpi}} and {I := (w,x^\delta)}.

One has a useful amount of flexibility in selecting the weights {a_d} for the Selberg sieve. The original work of Goldston, Pintz, and Yildirim, as well as the subsequent paper of Zhang, the choice

\displaystyle  a_d := \log(\frac{R}{d})_+^{k_0+\ell_0}

is used for some additional parameter {\ell_0 > 0} to be optimised over. More generally, one can take

\displaystyle  a_d := g( \frac{\log d}{\log R} )

for some suitable (in particular, sufficiently smooth) cutoff function {g: {\bf R} \rightarrow {\bf R}}. We will refer to this choice of sieve weights as the “analytic Selberg sieve”; this is the choice used in the analysis in the previous post.

However, there is a slight variant choice of sieve weights that one can use, which I will call the “elementary Selberg sieve”, and it takes the form

\displaystyle  a_d := \frac{1}{\Phi(d) \Delta(d)} \sum_{q \in {\mathcal S}_I: (q,d)=1} \frac{1}{\Phi(q)} f'( \frac{\log dq}{\log R}) \ \ \ \ \ (11)

for a sufficiently smooth function {f: {\bf R} \rightarrow {\bf R}}, where

\displaystyle  \Phi(d) := \prod_{p|d} \frac{p-k_0}{k_0}

for {d \in {\mathcal S}_I} is a {k_0}-variant of the Euler totient function, and

\displaystyle  \Delta(d) := \prod_{p|d} \frac{k_0}{p} = \frac{k_0^{\Omega(d)}}{d}

for {d \in {\mathcal S}_I} is a {k_0}-variant of the function {1/d}. (The derivative on the {f} cutoff is convenient for computations, as will be made clearer later in this post.) This choice of weights {a_d} may seem somewhat arbitrary, but it arises naturally when considering how to optimise the quadratic form

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} a_{d_1} a_{d_2} \Delta([d_1,d_2])

(which arises naturally in the estimation of {\alpha} in (6)) subject to a fixed value of {a_1} (which morally is associated to the estimation of {\beta} in (7)); this is discussed in any sieve theory text as part of the general theory of the Selberg sieve, e.g. Friedlander-Iwaniec.

The use of the elementary Selberg sieve for the bounded prime gaps problem was studied by Motohashi and Pintz. Their arguments give an alternate derivation of {DHL[k_0,2]} from {MPZ[\varpi,\theta]} for {k_0} sufficiently large, although unfortunately we were not able to confirm some of their calculations regarding the precise dependence of {k_0} on {\varpi,\theta}, and in particular we have not yet been able to improve upon the specific criterion in Theorem 6 using the elementary sieve. However it is quite plausible that such improvements could become available with additional arguments.

Below the fold we describe how the elementary Selberg sieve can be used to reprove Theorem 3, and discuss how they could potentially be used to improve upon Theorem 6. (But the elementary Selberg sieve and the analytic Selberg sieve are in any event closely related; see the appendix of this paper of mine with Ben Green for some further discussion.) For the purposes of polymath8, either developing the elementary Selberg sieve or continuing the analysis of the analytic Selberg sieve from the previous post would be a relevant topic of conversation in the comments to this post.

— 1. Sums of multiplicative functions —

In this section we review a standard estimate on a sum of multiplicative functions. We fix an interval {I \subset (w,+\infty)}. For any positive integer {k}, we say that a multiplicative function {f: {\bf N} \rightarrow {\bf R}} has dimension {k} if one has the asymptotic

\displaystyle  f(p) = k + O(\frac{1}{p})

for all {p \in I}; in particular (since {w \rightarrow \infty} as {x \rightarrow \infty}) we see that {f} is non-negative on {S_I} for {x} large enough. Thus for instance

\displaystyle  n \mapsto \frac{\phi(n)}{n}

has dimension one, the divisor function

\displaystyle  n \mapsto \tau(n)

has dimension two, and the functions

\displaystyle  n \mapsto k_0^{\Omega(n)},

\displaystyle  n \mapsto \frac{n}{\Phi(n)},

and

\displaystyle  n \mapsto n \Delta(n)

defined in the introduction have dimension {k_0}. Dimension interacts well with multiplication; the product of a {k}-dimensional multiplicative function and a {k'}-dimensional multiplicative function is clearly a {kk'}-multiplicative function.

We have the following basic asymptotic in the untruncated case {I = (w,+\infty)}:

Lemma 8 (Untruncated asymptotic) Let {I = (w,+\infty)} Let {k} be a fixed positive integer, and let {f: {\bf N} \rightarrow {\bf R}} be a multiplicative function of dimension {k}. Then for any fixed compactly supported, Riemann-integrable function {g: {\bf R} \rightarrow {\bf R}}, and any {R>1} that goes to infinity as {x \rightarrow \infty}, one has

\displaystyle  \sum_{d \in {\mathcal S}_I} \frac{f(d)}{d} g(\frac{\log d}{\log R}) = (\frac{\phi(W)}{W} \log R)^k ( \int_0^\infty g(t) \frac{t^{k-1}}{(k-1)!}\ dt + o(1) ).

Proof: By approximating {g} from above and below by smooth compactly supported functions we see that we may assume without loss of generality that {g} is smooth and compactly supported. But then the claim follows from Proposition 10 of the previous post. \Box

We remark that Proposition 10 of the previous post also gives asymptotics for a number of other sums of multiplicative functions, but one (small) advantage of the elementary Selberg sieve is that these (slightly) more complicated asymptotics are not needed. The generalisation in Lemma 8 from smooth {g} to Riemann integrable {g} implies in particular that

\displaystyle  \sum_{d \in {\mathcal S}_I: d \leq R} \frac{f(d)}{d} = (\frac{1}{k!} + o(1)) (\frac{\phi(W)}{W} \log R)^k \ \ \ \ \ (12)

and conversely Lemma 8 can be easily deduced from (12) by another approximation argument (using piecewise constant functions instead of smooth functions). We also make the trivial remark that if {g} is non-negative and {J} is any subset of {I}, then we have the upper bound

\displaystyle  \sum_{d \in {\mathcal S}_J} \frac{f(d)}{d} g(\frac{\log d}{\log R}) \leq (\frac{\phi(W)}{W} \log R)^k ( \int_0^\infty g(t) \frac{t^{k-1}}{(k-1)!}\ dt + o(1) ) \ \ \ \ \ (13)

for any non-negative Riemann integrable {g}.

Actually, (12) can be derived by purely elementary means (without the need to explicitly work with asymptotics of zeta functions as was done in the previous post) by an induction on the dimension {k} as follows. In the dimension zero case we have the Euler product

\displaystyle  \sum_{d \in {\mathcal S}_I} \frac{|f(d)|}{d} = \prod_{p \in I} (1 + \frac{|f(p)|}{p}) = 1+o(1)

and hence

\displaystyle  \sum_{d \in {\mathcal S}_I: d\neq 1} \frac{|f(d)|}{d} = o(1)

which gives (12) in the {k=0} case.

Now suppose that {f} has dimension {1}. In this case we write

\displaystyle  f(d) 1_{{\mathcal S}_I}(d) = \sum_{a|d; d/a \in {\mathcal S}_I} h(a) \ \ \ \ \ (14)

where {h} is a multiplicative function with

\displaystyle  h(p^j) := (-1)^{j-1} (f(p)-1) = O(\frac{1}{p^2}),

for all {p > w} and {j \geq 1}, and {h(p^j)=0} for {p \leq w} and {j \geq 1}. Then the left-hand side of (12) can be rearranged as

\displaystyle \sum_{a \leq R} \frac{h(a)}{a} \sum_{d \in {\mathcal S}_I: d \leq R/a} \frac{1}{d}.

Elementary sieving gives

\displaystyle  \sum_{d \in {\mathcal S}_I: d \leq y} 1 = (\frac{\phi(W)}{W} + o(1)) y + O( W )

and hence by summation by parts

\displaystyle  \sum_{d \in {\mathcal S}_I: d \leq y} \frac{1}{d} = (\frac{\phi(W)}{W} + o(1)) \log y + O( W ).

Meanwhile we have

\displaystyle  \sum_a \frac{|h(a)|}{a} = \prod_{p > w} (1 + \sum_{p=1}^\infty \frac{|f(p)-1|}{p^j}) = 1+o(1)

and so

\displaystyle  \sum_{a \neq 1} \frac{|h(a)|}{a} = o(1).

From these estimates one easily obtains (12) for {k=1}.

Now suppose that {k \geq 1} and that the claim has been proven inductively for {k-1}. We again may decompose (14), but now {g} has dimension {k-1} instead of dimension zero. Arguing as before, we can write the left-hand side of (12) as

\displaystyle \sum_{a \in {\mathcal S}_I: a \leq R} \frac{h(a)}{a} ( (\frac{\phi(W)}{W} + o(1)) \log y + O( W ) ).

The contribution of the {o(1)} and {O(W)} error terms are acceptable by induction hypothesis, and the main term is also acceptable from induction hypothesis and summation by parts, giving the claim.

— 2. Untruncated implication —

We first reprove Theorem 3. The key calculations for {\alpha} and {\beta} are as follows:

Lemma 9 (Untruncated sieve bounds) Assume {EH[\theta+\epsilon]} holds for some {1/2 < \theta < 1} and some {\epsilon>0}. Let {f: {\bf R} \rightarrow {\bf R}} be a smooth function that is supported on {[-1,1]}, let {{\mathcal H}} be a fixed admissible {k_0}-tuple for some fixed {k_0 \geq 2}, let {b\ (W)} be such that {b+h} is coprime to {W} for all {h \in {\mathcal H}}, and let {\nu} be the elementary Selberg sieve with weights (11) associated to the function {f}, the sieve level {R := x^{\theta/2}} and the untruncated interval {I := (w,+\infty)}. Then (6), (7) hold with

\displaystyle  \alpha := \int_0^1 f'(t)^2 \frac{t^{k_0-1}}{(k_0-1)!}\ dt, \ \ \ \ \ (15)

\displaystyle  \beta := \int_0^1 f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt, \ \ \ \ \ (16)

and

\displaystyle  A := (\frac{\phi(W)}{W} \log R)^{k_0}.

As computed in Theorem 14 of the previous post (and also in the recent preprint of Farkas, Pintz, and Revesz), the ratio

\displaystyle  \frac{\int_0^1 f'(t)^2 t^{k_0-1}\ dt}{\int_0^1 f(t)^2 t^{k_0-2}\ dt}

for non-zero {f} can be made arbitrarily close to {j_{k_0-2}^2/4} (the extremiser is not quite smooth at {t=1} if one extends by zero for {t>1}, but this can be easily dealt with by a standard regularisation argument), and Theorem 6 then follows from Lemma 7 (using the open nature of (2) to replace {EH[\theta]} by {EH[\theta+\epsilon]} for some small {\epsilon>0}).

It remains to prove Lemma 9. We begin with the proof of (6) (which will in fact be an asymptotic and not just an upper bound).

We expand the left-hand side of (6) as

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) a_{d_1} \mu(d_2) a_{d_2} \sum_{x \leq n \leq 2x: [d_1,d_2] | P(n); n=b\ (W)} 1.

The weights {a_{d_1} a_{d_2}} are only non-vanishing when {d_1,d_2 \leq R}. From the Chinese remainder theorem we then have

\displaystyle  \sum_{x \leq n \leq 2x: [d_1,d_2] | P(n); n=b\ (W)} 1 = \frac{x}{W} \Delta([d_1,d_2]) + O( [d_1,d_2] \Delta([d_1,d_2]) ).

The contribution of the error term is

\displaystyle  \ll \sum_{d_1,d_2 \in {\mathcal S}_I: d_1,d_2 \leq R} |a_{d_1}| |a_{d_2}| k_0^{\Omega([d_1,d_2])}

which we can upper bound by

\displaystyle  \ll (\sum_{d \in {\mathcal S}_I: d \leq R} |a_d| k_0^{\Omega(d)})^2.

Using (11) and Lemma 8 we have the crude upper bound

\displaystyle  |a_d| \ll \frac{1}{\Phi(d) \Delta(d)} (\frac{\phi(W)}{W} \log R)^{k_0} \ \ \ \ \ (17)

and hence by another application of Lemma 8 the previous expression may be upper bounded by {O( (W/\phi(W))^{O(1)} R^2 \log^{O(1)} R )}, which is negligible by choice of {R}. So we reduce to showing that

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) a_{d_1} \mu(d_2) a_{d_2} \Delta([d_1,d_2]) \leq (\alpha+o(1)) (\frac{\phi(W)}{W} \log R)^{k_0}. \ \ \ \ \ (18)

To proceed further we follow Selberg and observe the decomposition

\displaystyle  \Delta([d_1,d_2]) = \sum_{d_0|d_1,d_2} \Phi(d_0) \Delta(d_1) \Delta(d_2) \ \ \ \ \ (19)

for {d_1,d_2 \in {\mathcal S}_I}, which can be easily verified by working locally (when {d_1,d_2 \in \{1,p\}} for some prime {p \in I}) and then using multiplicativity. Using this identity we can diagonalise the left-hand side of (18) as

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \Phi(d_0) (\sum_{d \in {\mathcal S}_I: d_0|d} \mu(d) a_d \Delta(d))^2.

Now we use the form (11) of {a_d}, which has been optimised specifically for ease of computing this expression. We can expand {\sum_{d \in {\mathcal S}_I: d_0|d} \mu(d) a_d \Delta(d)} as

\displaystyle  \sum_{d \in {\mathcal S}_I: d_0|d} \frac{\mu(d)}{\Phi(d)} \sum_{q \in {\mathcal S}_I: (q,d) = 1} \frac{1}{\Phi(q)} f'( \frac{\log dq}{\log R});

writing {d = d_0 d_1} and {m = d_1 q}, we can rewrite this as

\displaystyle  \frac{\mu(d_0)}{\Phi(d_0)} \sum_{m \in {\mathcal S}_I: (m,d_0)=1} \frac{f'(\frac{\log d_0 m}{\log R})}{\Phi(m)} \sum_{d_1 | m} \mu(d_1)

which by Möbius inversion simplifies to

\displaystyle \frac{\mu(d_0)}{\Phi(d_0)} f'( \frac{\log d_0}{\log R} ).

The left-hand side of (18) has now simplified to

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \frac{1}{\Phi(d_0)} f'( \frac{\log d_0}{\log R} )^2.

By Lemma 8 and (15) we obtain (18) and hence (6) as required.

Now we turn to the more difficult lower bound (7) for a fixed {h_i \in {\mathcal H}} (again we will be able to get an asymptotic here rather than just a lower bound). The left-hand side expands as

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) a_{d_1} \mu(d_2) a_{d_2} \sum_{x \leq n \leq 2x: [d_1,d_2] | P(n); n = b\ (W)} \theta(n+h_i).

Again, {d_1,d_2} may be restricted to at most {R}, so that {[d_1,d_2]} is at most {R^2 = x^{1/2+2\varpi}}. As before, the inner summand vanishes unless {n+h_i \ ([d_1,d_2])} lies in one of the residue classes {C_i([d_1,d_2])}, where

\displaystyle  C_i(q) := \{ a \in {\bf Z}/q{\bf Z}^\times: P_i(a) = 0 \}

and {P_i} is the modified polynomial

\displaystyle  P_i(a) := \prod_{h \in {\mathcal H} \backslash \{h_i\}} (a+h-h_i).

The cardinality of {C_i(q)} is {\phi(q)\Delta^*(q)}, where

\displaystyle  \Delta^*(q) := \prod_{p|q} \frac{k_0-1}{p-1} = \frac{(k_0-1)^{\Omega(q)}}{\phi(q)}.

We can thus estimate

\displaystyle  \sum_{x \leq n \leq 2x: [d_1,d_2] | P(n); n = b\ (W)} \Lambda(n+h) = \frac{1}{\phi(W)} x \Delta^*([d_1,d_2]) + O( E^*([d_1,d_2]) )

where the error term {E^*(q)} is given by

\displaystyle  E^*(q) = \sum_{a \in C_i(q)} | \sum_{x \leq n \leq 2x: n=b\ (W); n = a\ (q)} \theta(n) - \frac{x}{\phi(Wq)}|.

By a modification of the proof of Proposition 5 we see that the hypothesis {EH[\theta+\epsilon]} implies that

\displaystyle  \sum_{q \leq R^2} h(q) E^*(q) \ll x \log^{-A} x

for any fixed {A>0} and any multiplicative function {h} of a fixed dimension {k}. Using the bound (17) we can then conclude that the contribution of the error term {E^*([d_1,d_2])} to (7) is negligible. So (7) becomes

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) a_{d_1} \mu(d_2) a_{d_2} \Delta^*([d_1,d_2]) \ \ \ \ \ (20)

\displaystyle  \geq (\beta-o(1)) (\frac{\phi(W)}{W}\log R)^{k_0+1}.

Analogously to (19) we have the decomposition

\displaystyle  \Delta^*([d_1,d_2]) = \sum_{d_0|d_1,d_2} \Phi^*(d_0) \Delta^*(d_1) \Delta^*(d_2) \ \ \ \ \ (21)

for {d_1,d_2 \in {\mathcal S}_I}, where {\Phi^*} is the function

\displaystyle  \Phi^*(d) := \prod_{p|d} \frac{p-k_0}{k_0-1}.

We can thus diagonalise the left-hand side of (20) similarly to before as

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \Phi^*(d_0) (\sum_{d \in S_I: d_0|d} \mu(d) a_d \Delta^*(d))^2.

We can expand {\sum_{d \in {\mathcal S}_I: d_0|d} \mu(d) a_d \Delta^*(d)} as

\displaystyle  \sum_{d \in {\mathcal S}_I: d_0|d} \frac{\mu(d)}{\Phi(d)} \frac{\Delta^*(d)}{\Delta(d)} \sum_{q \in {\mathcal S}_I: (q,d) = 1} \frac{1}{\Phi(q)} f'( \frac{\log dq}{\log R});

writing {d = d_0 d_1} and {m = d_1 q} and noting that {\frac{\Delta^*(d)}{\Delta(d)} = (1-\frac{1}{k_0})^{\Omega(d)}}, we can rewrite this as

\displaystyle  \frac{\mu(d_0)}{\Phi(d_0)} \frac{\Delta^*(d_0)}{\Delta(d_0)} \sum_{m \in {\mathcal S}_I: (m,d_0)=1} \frac{f'(\frac{\log d_0 m}{\log R})}{\Phi(m)} \sum_{d_1 | m} \mu(d_1) \frac{\Delta^*(d_1)}{\Delta(d_1)}.

Observe that

\displaystyle  \frac{1}{\Phi(m)} \sum_{d_1|m} \mu(d_1) \frac{\Delta^*(d_1)}{\Delta(d_1)} = \frac{1}{\phi(m)}

so we can simplify the left-hand side of (20) as

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \frac{h(d_0)}{d_0} (\sum_{m \in {\mathcal S}_I: (m,d_0)=1} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} )^2 \ \ \ \ \ (22)

where {h} is the {k_0-1}-dimensional multiplicative function

\displaystyle  h(d) := d \frac{\Phi^*(d)}{\Phi(d)^2} (\frac{\Delta^*(d)}{\Delta(d)})^2

\displaystyle  = \prod_{p|d} (k_0-1) \frac{(p-1)^2}{p(p-k_0)}.

To control this sum, let us first pretend that the {(m,d_0)=1} constraint was not present, thus suppose we had to estimate

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \frac{h(d_0)}{d_0} (\sum_{m \in {\mathcal S}_I} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} )^2. \ \ \ \ \ (23)

By Proposition 8, the inner sum {\sum_{m \in {\mathcal S}_I} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)}} is equal to

\displaystyle  = (\frac{\phi(W)}{W} \log R) (\int_0^\infty f'(t + \frac{\log d_0}{\log R})\ dt + o(1))

which by the fundamental theorem of calculus simplifies to

\displaystyle  = - (\frac{\phi(W)}{W} \log R) (f(\frac{\log d_0}{\log R})+ o(1)).

We remark that the error term {o(1)} here is uniform in {d_0}, because the translates {f'(\cdot + \frac{\log d_0}{\log R})} are equicontinuous and thus uniformly Riemann integrable. We conclude that (23) is equal to

\displaystyle  (\frac{\phi(W)}{W} \log R)^2 \sum_{d_0 \in {\mathcal S}_I} \frac{h(d_0)}{d_0} (f(\frac{\log d_0}{\log R})^2+ o(1))

where the error term {o(1)} is again uniform in {d_0}. By Proposition 8 and (16), this expression is equal to

\displaystyle  (\beta-o(1)) (\frac{\phi(W)}{W} \log R)^{k_0+1} \ \ \ \ \ (24)

as required.

Now we reinstate the condition {(m,d_0)=1}, which turns out to be negligible thanks to the {W}-trick. More precisely, we may use Möbius inversion to write

\displaystyle  \sum_{m \in {\mathcal S}_{(w,+\infty)}: (m,d_0)=1} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} \ \ \ \ \ (25)

\displaystyle  = \sum_{k | d_0} \frac{\mu(k)}{\phi(k)} \sum_{m \in {\mathcal S}_{(w,+\infty)}: (m,k)=1} \frac{f'(\frac{\log d_0 k m}{\log R})}{\phi(m)}.

By the preceding discussion, the {k=1} term of this sum is

\displaystyle - (\frac{\phi(W)}{W} \log R) (f(\frac{\log k}{\log R}) + o(1))

Now we consider the {k \neq 1} terms, which are error terms. We may bound the total contribution of these terms in magnitude by

\displaystyle  O( \sum_{k|d_0: k \neq 1} \frac{1}{\phi(k)} |\sum_{m \in {\mathcal S}_{(w,+\infty)}} \frac{f'(\frac{\log d_0 k m}{\log R})}{\phi(m)}| ).

Arguing as before we have

\displaystyle  \sum_{m \in {\mathcal S}_{(w,+\infty)}} \frac{f'(\frac{\log d_0 k m}{\log R})}{\phi(m)} = O( \frac{\phi(W)}{W} \log R )

and so the expression (25) becomes

\displaystyle  -(\frac{\phi(W)}{W} \log R) (f(\frac{\log d_0}{\log R}) + O( \frac{d_0}{\phi(d_0)}-1 ) + o(1) )

where the implied constant in the {O()} notation can depend on {f}. The square of this expression is then

\displaystyle  (\frac{\phi(W)}{W} \log R)^2 (f(\frac{\log d_0}{\log R})^2 + O( (\frac{d_0}{\phi(d_0)}-1)^2 ) + O( \frac{d_0}{\phi(d_0)}-1 ) + o(1) ).

The left-hand side of (20) is now expressed as the sum of the main term

\displaystyle  (\frac{\phi(W)}{W} \log R)^2 \sum_{d_0 \in {\mathcal S}_I} \frac{h(d_0)}{d_0} f(\frac{\log d_0}{\log R})^2

and the error terms

\displaystyle  O( (\frac{\phi(W)}{W} \log R)^2 \sum_{d_0 \in {\mathcal S}_I: d_0 \leq R} \frac{h(d_0)}{d_0} (\frac{d_0}{\phi(d_0)}-1)^j )

for {j=1,2} and

\displaystyle  o( (\frac{\phi(W)}{W} \log R)^2 \sum_{d_0 \in {\mathcal S}_I: d_0 \leq R} \frac{h(d_0)}{d_0} ).

The main term has already been estimated as (24). From Proposition 8 we have

\displaystyle  \sum_{d_0 \in {\mathcal S}_I: d_0 \leq R} \frac{h(d_0)}{d_0} (\frac{d_0}{\phi(d_0)})^j = (\frac{\phi(W)}{W} \log R)^{k_0-1} \int_0^1 \frac{x^{k_0-2}}{(k_0-2)!}+o(1)

for {j=0,1,2}, and so all of the error terms end up being {o( (\frac{\phi(W)}{W} \log R)^{k_0+1} )}, and (7) follows. This concludes the proof of Theorem 3.

— 3. Applying truncation —

Now we experiment with truncating the above argument to {I = (w,x^\delta)} to obtain results of the shape of Theorem 6. Unfortunately thus far the results do not give very good explicit dependencies of {k_0} on {\varpi,\delta}, but this may perhaps improve with further effort.

Assume {MPZ[\varpi,\delta]} holds for some {0 < \varpi < 1/4} and some {0 < \delta < 1/4+\varpi}. Let {f: {\bf R} \rightarrow {\bf R}} be a smooth function that is supported on {[-1,1]}, let {{\mathcal H}} be a fixed admissible {k_0}-tuple for some fixed {k_0 \geq 2}, let {b\ (W)} be such that {b+h} is coprime to {W} for all {h \in {\mathcal H}}, and let {\nu} be the elementary Selberg sieve with weights (11) associated to the function {f}, the sieve level {R := x^{1/4 + \varpi}} and the truncated interval {I := (w,x^\delta)}. As before, we set

\displaystyle  A := (\frac{\phi(W)}{W} \log R)^{k_0}

and seek the best values for {\alpha,\beta} for which we can establish the upper bound (6) and the lower bound (7). Arguing as in the previous section (using (13) to control error terms) we can reduce (6) to

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \frac{1}{\Phi(d_0)} f'( \frac{\log d_0}{\log R} )^2 \leq (\alpha+o(1)) (\frac{\phi(W)}{W} \log R)^{k_0}.

If we crudely replace the truncated interval {(w,x^\delta)} by the untruncated interval {(w,\infty)} and apply Proposition 8 (or (13)) we may reuse the previous value

\displaystyle  \alpha = \int_0^1 f'(t)^2 \frac{t^{k_0-1}}{(k_0-1)!}\ dx

for {\alpha} here, but it is possible that we could do better than this.

Now we turn to (7). Arguing as in the previous section, we reduce to showing that

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \frac{h(d_0)}{d_0} (\sum_{m \in {\mathcal S}_I: (m,d_0)=1} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} )^2

\displaystyle  \geq (\beta-o(1)) (\frac{\phi(W)}{W} \log R)^{k_0+1}.

We can, if desired, discard the {(m,d_0)=1} constraint here by arguing as in the previous section, leaving us with

\displaystyle  \sum_{d_0 \in {\mathcal S}_I} \frac{h(d_0)}{d_0} (\sum_{m \in {\mathcal S}_I} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} )^2 \ \ \ \ \ (26)

\displaystyle  \geq (\beta-o(1)) (\frac{\phi(W)}{W} \log R)^{k_0+1}.

Because we now seek a lower bound, we cannot simply pass to the untruncated interval {(w,\infty)} (e.g. using (13)), and must proceed more carefully. A simple way to proceed (as was done by Motohashi and Pintz) is to just discard all {d_0} less than {x^{-\delta} R}, only retaining those {d_0} in the region between {x^{-\delta} R} and {R}. The reason for doing this is that the {m} parameter is then forced to be at most {x^\delta} if one wants the summand to be non-zero, and so for the {m} summation at least one can replace {I} by {(w,+\infty)} without incurring any error. As in the previous section we then have

\displaystyle  \sum_{m \in {\mathcal S}_I} \frac{f'(\frac{\log d_0 m}{\log R})}{\phi(m)} = - (\frac{\phi(W)}{W} \log R) (f(\frac{\log d_0}{\log R})+ o(1))

and so one can lower bound (26), up to negligible errors, by

\displaystyle (\frac{\phi(W)}{W} \log R)^2 \sum_{d_0 \in {\mathcal S}_I: x^{-\delta} R \leq d_0 \leq R} \frac{h(d_0)}{d_0} f(\frac{\log d_0}{\log R})^2.

If the truncated interval {I} were replaced by the untruncated interval {(w,\infty)}, then Proposition 8 would estimate this expression as

\displaystyle (\frac{\phi(W)}{W} \log R)^{k+1} \int_{1-\frac{\delta}{1/4+\varpi}}^1 f(t)^2 \frac{t^{k-2}}{(k-2)!}\ dt.

To deal with the truncated interval {I}, we use a variant of the Buchstab identity, namely the easy inequality

\displaystyle  \sum_{d \in {\mathcal S}_I: d \leq R} F(d) \geq \sum_{d \in {\mathcal S}_{(w,+\infty)}: d \leq R} F(d) - \sum_{x^\delta \leq p \leq R} \sum_{d \in {\mathcal S}_{(w,+\infty)}: d \leq R/p} F(pd)

for any non-negative function {F}. Using this identity and Proposition 8, we find that we may lower bound (26), up to negligible errors, by

\displaystyle  (\frac{\phi(W)}{W} \log R)^{k+1} \int_{1-\frac{\delta}{1/4+\varpi}}^1 f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt

minus the sum

\displaystyle  (k_0-1) (\frac{\phi(W)}{W} \log R)^{k_0+1} \sum_{x^\delta \leq p \leq R} \int_0^{1-\log p/\log R} f(t+\frac{\log p}{\log R})^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt.

(The {(k_0-1)} term comes from {h(p)}.)If {f} is non-negative and non-increasing on {[0,1]}, then we can upper bound

\displaystyle  f(t+\frac{\log p}{\log R}) \leq f( t / (1 - \frac{\log p}{\log R}) )

for {0 \leq t \leq 1-\log p/\log R}, and so

\displaystyle  \int_0^{1-\log p/\log R} f(t+\frac{\log p}{\log R})^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt

\displaystyle  \leq (1-\frac{\log p}{\log R})^{k_0-1} \int_0^1 f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt.

On the other hand, from the prime number theorem we have

\displaystyle  \sum_{x^\delta \leq p \leq R} (1-\frac{\log p}{\log R})^{k_0-1} = \int_{\delta/(1/4+\varpi)}^1 (1-t)^{k_0-1}\ \frac{dt}{t} + o(1).

Putting all this together, we can thus obtain (7) with

\displaystyle  \beta := (1-\kappa) \int_0^1 f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt

where

\displaystyle  \kappa := \frac{\int_0^{1-\frac{\delta}{1/4+\varpi}} f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt}{\int_0^{1} f(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt} + \kappa' \ \ \ \ \ (27)

and

\displaystyle  \kappa' := (k_0-1) \int_{\delta/(1/4+\varpi)}^1 (1-t)^{k_0-1}\ \frac{dt}{t}.

Following Pintz, we may upper bound {(1-t)^{k_0-1}} by {\exp(-(k_0-1) t)} and rescale to obtain

\displaystyle  \kappa' \leq \int_{(k_0-1)\delta/(1/4+\varpi)} \exp(-t) \frac{dt}{t}

which we can crudely bound by

\displaystyle  \kappa' \leq \exp( - (k_0-1)\delta/(1/4+\varpi)).

But of course we can also calculate {\kappa} and {\kappa'} explicitly for any fixed choice of {\delta,\varpi,k_0}. We conclude the following variant of Theorem 6:

Theorem 10 (MPZ implies DHL) Let {0 < \varpi < 1/4}, {0 < \delta < 1/4+\varpi}, and let {k_0 \geq 2} be an integer obeying the constraint

\displaystyle  (1+4\varpi)(1-\kappa) > \frac{4}{k_0(k_0-1)} \frac{\int_0^1 f'(t)^2 t^{k_0-1}\ dt}{\int_{1-\frac{\delta}{1/4+\varpi}}^1 f(t)^2 t^{k_0-2}\ dt},

with {\kappa} given by (27), and some smooth {f: {\bf R} \rightarrow {\bf R}} supported on {[-1,1]} which is non-negative and non-increasing on {[0,1]}. Then {MPZ[\varpi,\delta]} implies {DHL[k_0]}.

For {k_0} large enough depending on {\varpi,\delta} the hypotheses in Theorem 10 can be verified (e.g. by setting {f(t) = (1-t)^l} for a reasonably large {l}) but the dependence is poor due to the localisation of the integral in the denominator to the narrow interval {[1-\delta/(1/4+\varpi),1]}. But perhaps there is a way to not have such a strict localisation in these arguments.


Filed under: expository, math.NT, polymath Tagged: Janos Pintz, polymath8, Selberg sieve, sieve theory, Yitang Zhang, Yoichi Motohashi

Chad OrzelQuasi Poll: Most Needed Pop-Science Biography?

I’ve got a ton of stuff that needs to get done this week, but I don’t want the blog to be completely devoid of new content, so here’s a quasi-poll question for my wise and worldly readers:

What scientist is most in need of a good popular biography?

By “popular biography,” I mean things like Norton’s Great Discoveries books, several of which Ive reviewed here, including Krauss on Feynman and Reeves on Rutherford, two books that I keep coming back to for useful tidbits. These aren’t deep works of historical scholarship, and don’t necessarily attempt to be definitive, but focus on being accessible and readable.

There are only a small number of these out there, though, and many important scientists don’t have this kind of bio yet. So, the question to be answered in comments is: who should get one of these sorts of books that doesn’t already have one?

I’ve been reading a lot of history of physics recently for the book-in-progress, specifically about the history of QED, and I think at this point, I’d probably vote for a Wolfgang Pauli biography. This may seem odd, as Pauli was a theorist’s theorist, who was so inept in the laboratory that some experimentalists once attributed a lab failure to the fact that Pauli was changing trains in their city at the time that their apparatus broke.

At the same time, though, the histories I’ve been reading put Pauli at or near the center of physics in the mid 20th Century– he contributed to all the major problems, and more importantly seems to have been a key communications nexus. Everybody working on quantum physics appears to have written to and gotten responses from Pauli. And he was pretty entertaining, in a witheringly sarcastic, quirky sort of way. The photo at the top, taken from Roy Glauber’s autobiography at the Nobel Prize website is a pretty good indication: Pauli was kicking a soccer ball around, and when he saw Glauber about to take a photo of this, he turned and kicked the ball directly into the camera…

So, I bet it’d be fun to read a good popular bio of Pauli. Somebody should get on writing one of those.

Who’s your favorite scientist who ought to get a good popular biography?

Tommaso DorigoThe Plot Of The Week - B Production Cross Section

LHCb, one of the two "satellite" experiments at the Large Hadron Collider, is a detector focusing on the production of B hadrons in proton-proton collisions. It does so by looking at only one side of the collision point, which is showered by the majority of the debris produced when one very-high-momentum parton inside the proton coming from the other side hits a moderately or low-momentum parton in the other proton coming from the LHCb side of the collision region.
A sketch of the LHCb layout is shown below.


(In the picture you can see the various detector elements seen from a side. The interaction point is on the left.)

read more

Victor RivellesPorque supersimetria?

Depois do vídeo explicando o que é supersimetria temos agora outro vídeo explicando porque a supersimetria é necessária. Vale a pena conferir.



BackreactionPhenomenological Quantum Gravity

Participants of the 2012 conference on 
Experimental Search for Quantum Gravity.
The search for quantum gravity and a theory of everything captures the public imagination like no other area in theoretical physics. It aims to answer three questions that every two-year old could ask if they would just stop being obsessed with cookies for a minute: What is space? What is time? And what is matter? We know that the answers we presently have to these questions are not fundamentally correct; they are merely approximately correct. And we want to know. We really really want to know. (The cookies. Are out.)

Strictly speaking of course physics will not tell you what reality is but what reality is best described by. Space and time are presently described by Einstein’s theory of general relativity; they are classical entities that do not have quantum properties. Matter and radiation are quantum fields described by the standard model. Yet we know that this cannot be the end of the story because the quantum fields carry energy and thus gravitate. The gravitational field thus must be compatible with the quantum aspects of matter sources. Something has to give, and it is generally expected that a quantization of gravity is necessary. I generally refer to ‘quantum gravity’ as any approach to solve this tension. In a slight abuse of language, this also includes approaches in which the gravitational field remains classical and the coupling to matter is modified.

Quantizing gravity is actually not so difficult. The problem is that the straight-forward, naive, quantization does not give a theory that makes sense as a fundamental theory. The result is said to be non-renormalizable, meaning it is a good theory only in some energy ranges and cannot be taken to describe the very essence of space, time, and matter. There are meanwhile several other, not-so-naïve, approaches to quantum gravity – string theory, loop quantum gravity, asymptotically safe gravity, causal dynamical triangulation, and a handful of others. The problem is that so far none of these approaches has experimental evidence.

This really isn’t so surprising. To begin with, it’s a technically hard problem that has kept some of the brightest minds on the planet occupied for decades. But besides this, returns on investment have diminished with the advent of scientific knowledge. The low hanging fruits have all been picked. Now we have to develop increasingly more complex experiments to find new physics. This takes time, not to mention effort and money. With that, progress slows.

And quantum gravity is a particularly difficult area for experiment. It’s not just a weak force, it’s weaker than the weak force! This grammatical oxymoron is symptomatic of the problem: Quantum effects of gravity are really, really tiny. Most of the time when I estimate an effect, it turns out to be twenty or more orders of magnitude below experimental precision. I’ve sometimes joked I should write a paper on “50 ways one cannot test quantum gravity”, just to make use of these estimates. It’s clearly not a low hanging fruit, and we shouldn’t be surprised it takes time to climb the tree.

Some people have claimed on occasion that the lack of a breakthrough in the area is due to sociological problems in the organization of knowledge discovery. There are indeed problems in the organization of knowledge discovery today. We use existing resources inefficiently, and I do think this hinders progress. But this is a problem which affects all of academia and is not special to quantum gravity.

I think the main reason why we don’t yet know which theory describes gravity in the quantum regime is that we haven’t paid enough attention to the phenomenology.

One reason phenomenological quantum gravity hasn’t gotten much attention so far is that it has long been believed experimental evidence for quantum gravity is inaccessible to experiment (a belief promoted prominently by Freeman Dyson). The more relevant reason is though that in the field of theoretical physics it’s a very peculiar research topic. In all other areas of physics, researchers share either a common body of experimental evidence and aim to develop a good theory. Or they share a theoretical framework and aim to explore its consequences. Phenomenological quantum gravity has neither a shared theory nor a shared set of data. So what can the scientist do in this situation?

Methodology

The phenomenology of quantum gravity proceeds by the development of models that are specifically designed to test for properties of the yet-to-be-found theory of quantum gravity. These phenomenological models are normally extensions of known theories and are developed with the explicit aim of testing for general features. These models do not aim to be fundamental theories on their own.

Examples of such general properties that the fundamental theory might have are: violations or deformations of Lorentz-invariance, additional space-like dimensions, the existence of a minimal length scale or a generalized uncertainty principle, holography, space-time fluctuations, fundamental discreteness, and so on. I discuss a few examples below. If we develop a model that can be constrained by data, we will learn what properties the fundamental theory can have, and which it cannot have. This in turn can serve as guidance for the development of the theory.

In practice, these phenomenological models quantify deviations from general relativity and/or quantum field theory. One expects that the only additional dimensionful scale in these models is the Planck scale, which gives a ‘natural’ range for the expected size of effects in which all dimensionless constants are of order one. The aim is then to find an experiment that is sensitive to this natural parameter range. Since most of these models do not actually deal with quanta of the gravitational field, I prefer to speak more generally of “Planck scale effects” being what we are looking for.

Example: Lorentz-invariance violation

The best known example that demonstrates that effects are measureable even when they are suppressed by the Planck scale are violations of Lorentz-invariance. You expect violations of Lorentz-invariance in models for space-time that make use of a preferred frame that violates observer-independence, for example some regular lattice or condensate that evolves with some special time-slicing.

Such violations of Lorentz-invariance can be described by extensions of the standard model that couple to a time-like vector field and these couplings change the predictions of the standard model. Even though the effects are tiny, many of them are measureable.

The best example is maybe vacuum Cherenkov-radiation: the spontaneous emission of a photon by an electron. This process is normally entirely forbidden which makes it a very sensitive probe. With Lorentz-invariance violation, an electron above a certain energy will start to lose energy by radiating photons. We thus should not receive electrons above this threshold from distant astrophysical sources. From the highest energies of electrons of astrophysical origin that we have measured we can thus derive a bound on the possible violation of Lorentz invariance. This bound is today already (way) beyond the Planck scale, which means that the natural parameter range is excluded.

This shows that we can constrain Planck scale effects even though they are tiny.

Now this is a negative result in the sense that we have ruled out certain properties. But from this we have learned a lot. Approaches which induce such violations of Lorentz-invariance are no longer viable.

Example: Lorentz-invariance deformation

Deformations of Lorentz-invariance have been suggested as symmetries of the ground state of space-time. In contrast to violations of Lorentz-invariance, they do not single out a preferred frame. They generically lead to modifications of the speed of light, which can become energy-dependent.

I have explained a great many times that I think these models are flawed because they bring more problems than they solve. But leaving aside my criticism of the model, it can be experimentally tested. The energy dependence of the speed of light is tiny – a Planck scale effect – but the measurable time-difference adds up over the distance that photons of different energies travel. This is why highly energetic photons from distant gamma ray bursts are presently receiving a lot of attention as possible probes of quantum gravitational effects.

The current status is that we are just about to reach the natural parameter range expected for a Planck scale effect. It is presently a very active research area.

Example: Decoherence induced by space-time foam

If space-time undergoes quantum fluctuations that couple to all matter fields, this may induce decoherence in quantum mechanical oscillations. We discussed this previously in this post. In oscillations of neutral Kaon systems, we are presently just about to reach Planck scale sensitivity.

Misc other examples

There is no lack of creativity in the community! Some other examples of varying plausibility that we have discussed on this blog are Craig Hogan’s quest for holographic noise, Bekenstein’s table-top experiment that searches for Planck-length discreteness, massive quantum oscillators testing Planck-scale modified commutation relations, and searches for evidence for a generalized uncertainty in tritium decay. There is also a vast body of work on leftover quantum gravitational effects from the early universe, captured in various models for string cosmology and loop quantum cosmology, and of course there are cosmic (super) strings. There are further proposed tests for the idea that gravity is just classical (still a little outside the natural parameter range), and suggestions to look for dimensional reduction.

This is not an exhaustive list but just to give you a sense of the breadth of the topics.

Demarcation issues

What counts and what doesn’t count as phenomenological quantum gravity is inevitably somewhat subjective. I do for example not count the beyond the standard model physics of grand unification, though, if you believe in a theory of everything, this might be relevant for quantum gravity. I also don’t count applications of AdS/CFT because these do not describe gravitational systems in our universe, though arguably they are examples for some quantized version of gravity. I also don’t count general modifications of quantum theory or general relativity, though these might of course be very relevant to the problem. I don’t label these phenomenological quantum gravity mostly for practical reasons, not for ideological ones. One has to draw the line somewhere.

Endnote

I often get asked which approach to quantum gravity I believe in. When it comes to my religious affiliation, I’m not only an atheist, I was never Christianized. I have never belonged to any church and I have no intention to join one. The same can be said about my research in quantum gravity. I don’t belong to any church and have never been Christianized. I have on occasion erroneously been called a string theorist and I have been mistaken for working on loop quantum gravity. Depending on the situation, that can be amusing (on a conference) or annoying (in a job interview). For many people it still seems to be hard to understand that the phenomenology of quantum gravity is a separate research area that does not built on the framework of any particular approach.

The aim of my work is to identify the most promising experiments to find evidence for quantum gravity. For that, we need phenomenological models to quantify the effects, and we need to understand the models that we have (for me that includes criticizing them). I follow with interest the progress in various approaches to quantum gravity (presently I’m quite excited about Causal Sets) and I try to develop testable phenomenological models based on these developments. On the practical side, I organize conferences and workshops to bring together theoreticians with experimentalists who have an interest in the topic to stimulate exchange and the generation of new ideas.

What I do believe in, and what I hope the above examples illustrate, is that it is possible for us to find experimental evidence for quantum gravity if we ask the right questions and look in the right places.

Victor RivellesTetraquarks?

Na natureza os quarks sempre aparecem em combinações de três quarks formando bárions (como o próton e o nêutron) ou num par quar-anti-quark formando mésons (como o píon). Em 2005 foi descoberta uma partícula exótica chamada de Y(4260) que parece ser composta de dois quarks e um glúon (a partícula responsável pela força forte). Para tentar compreender a estrutura dessa partícula dois grupos, BELLE (no Japão) e BESIII (na China), estudaram o decaimento do Y(4260) e descobriram uma nova partícula, denominada Zc(3900) e que parece ser ainda mais exótica que a Y(4260). A nova partícula possui carga elétrica e um quark charmoso e anti-quark charmoso e parece conter mais dois quarks, um quark up e um anti-quark down, uma partícula com quatro quarks! Anos atrás outros grupos experimentais encontraram evidências de tetra quarks mas este é o primeiro resultado experimental com resultados mais sólidos. Porém, outras interpretações não estão excluídas ainda. A Zc(3900) poderia ser um estado ligado de dois mésons formando uma molécula hadrônica, um proposta teórica que nunca foi detectada experimentalmente. Outra interpretação menos excitante é que poderia ser apenas um estado com dois mésons porém sem formar uma  molécula hadrônica. Mais dados são necessários para compreender essa partículas. Vamos aguardar. Os dois trabalhos foram publicados no Physical Review Letters desta semana.

Edinburgh Mathematical Physics GroupOn the subject of exams… and football

This is my second year as convener of the Honours Board of Examiners at the School of Mathematics of the University of Edinburgh. (In earlier more politically incorrect days, I would have been called the “chairman”, and I have the old minutes to prove it!) According to the university regulations, I am ultimately “responsible for [...]

June 17, 2013

n-Category Café Quasicrystals and the Riemann Hypothesis

Freeman Dyson is a famous physicist who has also dabbled in number theory quite productively. If some random dude said the Riemann Hypothesis was connected to quasicrystals, I’d probably dismiss him as a crank. But when Dyson says this, it’s a lot more interesting. So I’ve been trying to understand his remarks on this. And it’s been productive, in that I’ve learned some interesting things, and I now feel closer to seeing why the Riemann Hypothesis is a natural and important conjecture.

But still, I could use a lot of help: I don’t have much time for number theory, and a few pointers from experts could keep me from going down dead ends.

Those of you who were using the internet around 1990 may remember newsgroups like sci.math, sci.physics, sci.math.research and sci.physics.research. That’s how internet-savvy mathematicians and physicists communicated back then. And if you were around then, you may remember Matt McIrvin, who wrote consistently intelligent and good-natured posts about physics.

I lost track of him for a while, but met him again on Google+, where has been using the open-source math software called Sage to play around with ideas from number theory.

Sage comes equipped with a table of nontrivial zeros of the Riemann zeta function computed by Andrew Odlyzko. Remember, this function is given by

ζ(s)= n=1 n s

when Re(s)>1, but it can be analytically continued over the whole complex plane except for a pole at 1, and then it turns out that ζ(z)=0 for a lot of points on a line:

z=12+ik

where k is a real number. Here are the first few:

k 114.1347

k 221.0220

k 325.0109

The Riemann Hypothesis claims that all the zeros of the zeta function lie on this line, except for the so-called trivial zeros at 2,4,6,8 and so on. Riemann only checked this for the first three cases… but by now people have checked it for the first 10,000,000,000,000 cases. So it seems to be true, but nobody can prove it.

The nontrivial zeros of the Riemann zeta function are really interesting. There’s no simple formula for them, but they encode information about prime numbers. Riemann was the first to notice this… but Matt ran into it on his own.

He took the first ten thousand positive numbers k j that make

ζ(12+ik j)=0,

and he added up a bunch of functions like this:

exp(ik jx)

one for each j=1,,10,000.

The result is a function of x, and he graphed its absolute value. The graph has lots of sharp spikes!

And where are these spikes? He zoomed in on the first few:

He wrote:

A closeup of those first spikes. I wonder what those numbers are, exactly? Probably they’re in the literature somewhere.

I looked around and soon found that those spikes should be here:

ln(2),ln(3),ln(4),ln(5),ln(7),ln(8),ln(9),ln(11),ln(13),ln(16),...

See the pattern?

Dyson’s remarks

In math jargon, what Matt did is take the Fourier transform of a sum of Dirac deltas supported at the imaginary parts of the nontrivial Riemann zeta zeros. The answer seemed to be another sum of Dirac deltas, times different numbers: the different spikes in the pictures above seem to have different heights. It’s unusual to take the Fourier transform of such a spiky function and get another spiky function. And according to Freeman Dyson, this is the defining feature of a quasicrystal!

When I was looking around for clues, one of the first things I ran into was a lecture by Dyson. He never actually delivered this lecture—it was cancelled at the last minute for some reason—but it was printed here:

  • Freeman Dyson, Frogs and birds, Notices of the American Mathematical Society 56 (2009), 212–223.

It was about two styles of doing mathematics, hence the curious title. He said:

The proof of the Riemann Hypothesis is a worthy goal, and it is not for us to ask whether we can reach it. I will give you some hints describing how it might be achieved. Here I will be giving voice to the mathematician that I was fifty years ago before I became a physicist. I will talk first about the Riemann Hypothesis and then about quasicrystals.

There were until recently two supreme unsolved problems in the world of pure mathematics, the proof of Fermat’s Last Theorem and the proof of the Riemann Hypothesis. Twelve years ago, my Princeton colleague Andrew Wiles polished off Fermat’s Last Theorem, and only the Riemann Hypothesis remains. Wiles’ proof of the Fermat Theorem was not just a technical stunt. It required the discovery and exploration of a new field of mathematical ideas, far wider and more consequential than the Fermat Theorem itself. It is likely that any proof of the Riemann Hypothesis will likewise lead to a deeper understanding of many diverse areas of mathematics and perhaps of physics too. Riemann’s zeta-function, and other zeta-functions similar to it, appear ubiquitously in number theory, in the theory of dynamical systems, in geometry, in function theory, and in physics. The zeta-function stands at a junction where paths lead in many directions. A proof of the hypothesis will illuminate all the connections. Like every serious student of pure mathematics, when I was young I had dreams of proving the Riemann Hypothesis. I had some vague ideas that I thought might lead to a proof. In recent years, after the discovery of quasicrystals, my ideas became a little less vague. I offer them here for the consideration of any young mathematician who has ambitions to win a Fields Medal.

Quasicrystals can exist in spaces of one, two, or three dimensions. From the point of view of physics, the three-dimensional quasicrystals are the most interesting, since they inhabit our three-dimensional world and can be studied experimentally. From the point of view of a mathematician, one-dimensional quasicrystals are much more interesting than two-dimensional or three-dimensional quasicrystals because they exist in far greater variety. The mathematical definition of a quasicrystal is as follows. A quasicrystal is a distribution of discrete point masses whose Fourier transform is a distribution of discrete point frequencies. Or to say it more briefly, a quasicrystal is a pure point distribution that has a pure point spectrum. This definition includes as a special case the ordinary crystals, which are periodic distributions with periodic spectra.

Excluding the ordinary crystals, quasicrystals in three dimensions come in very limited variety, all of them associated with the icosahedral group. The two-dimensional quasicrystals are more numerous, roughly one distinct type associated with each regular polygon in a plane. The two-dimensional quasicrystal with pentagonal symmetry is the famous Penrose tiling of the plane.

Finally, the one-dimensional quasicrystals have a far richer structure since they are not tied to any rotational symmetries. So far as I know, no complete enumeration of one-dimensional quasicrystals exists. It is known that a unique quasicrystal exists corresponding to every Pisot–Vijayaraghavan number or PV number. A PV number is a real algebraic integer, a root of a polynomial equation with integer coefficients, such that all the other roots have absolute value less than one [1]. The set of all PV numbers is infinite and has a remarkable topological structure. The set of all one-dimensional quasicrystals has a structure at least as rich as the set of all PV numbers and probably much richer. We do not know for sure, but it is likely that a huge universe of one-dimensional quasicrystals not associated with PV numbers is waiting to be discovered.

Here comes the connection of the one-dimensional quasicrystals with the Riemann Hypothesis. If the Riemann Hypothesis is true, then the zeros of the zeta-function form a one-dimensional quasicrystal according to the definition. They constitute a distribution of point masses on a straight line, and their Fourier transform is likewise a distribution of point masses, one at each of the logarithms of ordinary prime numbers and prime-power numbers. My friend Andrew Odlyzko has published a beautiful computer calculation of the Fourier transform of the zeta-function zeros [2]. The calculation shows precisely the expected structure of the Fourier transform, with a sharp discontinuity at every logarithm of a prime or prime-power number and nowhere else.

My suggestion is the following. Let us pretend that we do not know that the Riemann Hypothesis is true. Let us tackle the problem from the other end. Let us try to obtain a complete enumeration and classification of one-dimensional quasicrystals. That is to say, we enumerate and classify all point distributions that have a discrete point spectrum […] We shall then find the well-known quasicrystals associated with PV numbers, and also a whole universe of other quasicrystals, known and unknown. Among the multitude of other quasicrystals we search for one corresponding to the Riemann zeta-function and one corresponding to each of the other zeta-functions that resemble the Riemann zeta-function. Suppose that we find one of the quasicrystals in our enumeration with properties that identify it with the zeros of the Riemann zeta-function. Then we have proved the Riemann Hypothesis and we can wait for the telephone call announcing the award of the Fields Medal.

These are of course idle dreams. The problem of classifying one-dimensional quasicrystals is horrendously difficult, probably at least as difficult as the problems that Andrew Wiles took seven years to explore. But if we take a Baconian point of view, the history of mathematics is a history of horrendously difficult problems being solved by young people too ignorant to know that they were impossible. The classification of quasicrystals is a worthy goal, and might even turn out to be achievable.

[1] M. J. Bertin et al., Pisot and Salem Numbers, Birkhäuser, Boston, 1992.

[2] A. M. Odlyzko, Primes, quantum chaos and computers, in Number Theory: Proceedings of a Symposium, 4 May 1989, Washington, DC, USA (National Research Council, 1990), pp. 35–46.

Questions

I wanted to understand this better, so I asked around on Mathoverflow. I got a lot of help. For starters, I got pointed to some critical remarks by Nick S., who said:

Well his definition of quasicrystals is not the one used by the quasicrystal community (we actually don’t have a formal mathematical definition yet)…. His statement about icosahedral group is false, actually most 3-dimensional models don’t have any symmetry group. Same issue for the 2 dimensional quasicrystals, most of them are not related to polygons in the plane. […] I really have no idea what he mean by “it is well known that a unique quasicrystal exists corresponding to every PV number”. The existence is true, the uniqueness is far for true… Unless I’m making a terrible mistake, there are constructions which produce pure point diffractive sets from PV numbers, and they produce uncountably many… In many situations, but not always, one can probably get that most of them are “equivalent” in some sense, but not all of them… And the big issue is that any equivalence in this sense, unless one adds very strong extra conditions, allows for .. small translations of the points… And there are of course uncountably many models which are not associated to PV numbers. Another issue is that the zeroes of the RZF are not a Delone set, so anything done so far by the quasicrystal community is not relevant to the problem… And last, I really don’t see how one can go around the following issue: Let Λ be the set of zeroes. Let Λ be the set obtained by moving all the zeroes, such that the nth zero is moved by at most 1/n. Then diffraction cannot differentiate between Λ and Lambda.

I don’t know if these remarks are true, so if any of you know, please tell me… preferably with references!

I don’t care too much if Dyson is using ‘quasicrystal’ in a nonstandard sense. He at least seems to be hinting at a fairly precise definition, perhaps “a countable sum of Dirac deltas on n that defines a tempered distribution whose Fourier transform is a countable linear combination of Dirac deltas”. The phrase ‘defines a tempered distribution’ just means the Dirac deltas don’t bunch up too fast, so the Fourier transform is well-defined. Allowing the original sum to also be a more general linear combination of Dirac deltas might be be nice, too: then the Fourier transform of a quasicrystal would be another quasicrystal!

Anyway, what I’d like to learn is what’s known about such entities. In what sense, if any, does any Pisot–Vijayaraghavan number give a unique quasicrystal in 1 dimension? Do de Bruijn’s quasiperiodic tilings, like this one drawn by Greg Egan’s software, give quasicrystals in Dyson’s sense?

Is it really true that all quasicrystals in 3 dimensions are related to the icosahedral group? What’s the theorem there? And what about the 4-dimensional pattern built from the E8 lattice—is that a quasicrystal in Dyson’s sense? Is it hard to find higher-dimensional quasicrystals, or easy?

The Guinand–Weil explicit formula

But now I’d like to come to my actual point, which concerns this remark:

If the Riemann Hypothesis is true, then the zeros of the zeta-function form a one-dimensional quasicrystal according to the definition. They constitute a distribution of point masses on a straight line, and their Fourier transform is likewise a distribution of point masses, one at each of the logarithms of ordinary prime numbers and prime-power numbers.

Is this actually true? In other words, does the Riemann Hypothesis actually imply this? And if so, why?

At first I thought it was true. That would make a very nice bumper sticker explaining the ‘meaning’ of the Riemann Hypothesis… something like

RIEMANN ZETA ZEROS ARE FOURIER DUAL TO LOGS OF PRIME POWERS!

This is the first version of the Riemann Hypothesis I’ve seen that makes me really want it to be true. You can see it discussed in this excellent book draft here, with lots of pretty pictures:

  • Barry Mazur and Richard Stein, Primes.

But it seems the truth is a bit more complicated. The truth is called the explicit formula of Guinand and Weil and it involves terms, not only for the nontrivial zeros of the zeta function, but also for the trivial zeros, and the pole. And in fact it’s best to think of this formula not in terms of the original Riemann zeta function, but the ‘corrected’ version that takes the ‘prime at infinity’ into account using the gamma function, namely:

Λ(s)=π s/2Γ(s/2)ζ(s)

This ‘corrected’ version has the all-important symmetry:

Λ(s)=Λ(1s).

that lets you see the zeros of this function, and thus all the nontrivial zeros of the original Riemann zeta function, lie in the strip 0Re(s)1.

So, I still have hope for getting a conceptually clear statement of the Riemann Hypothesis that’s exactly correct! However, so far, I can’t seem to say something correct without it looking rather messy. For example, on Mathoverflow Brad Rogers stated a version of the Guinand–Weil explicit formula that looks about like this:

For a compactly supported smooth function g: with Fourier transform g^,

kg^(k/2π)= [g(x)+g(x)]e x/2d(e xψ(e x))+ Ω(ξ)2πg^(ξ/2π)dξ

Here the sum is over k such that 1/2+ik is a non-trivial zero of the Riemann zeta function. ψ is the Chebyshev prime counting function:

ψ(x)= p kxln(p)

and

Ω(ξ)=12ΓΓ(1/4+iξ/2)+12ΓΓ(1/4iξ/2)logπ.

Alas, this formula doesn’t look very ‘conceptual’, though I think I’m beginning to understand it, and Marc Palm gave a nice sketch of a proof.

In this formula, k can be complex if the Riemann Hypothesis is false. But if it’s true, k is always real, and the left-hand side of the big equation

kg^(k/2π)

is really just the test function g integrated against a sum of complex exponentials, one for each nontrivial zero of the zeta function. (I should warn you that these zeros come in complex conjugate pairs, so for each positive real k we get a corresponding negative k).

The right-hand side of the big equation contains a ‘nice’ term that’s a sum over prime powers, but also some ‘corrections’ that seem to make Dyson’s claim fail to be literally true. For example, besides a linear combination of Dirac deltas at logarithms of prime powers, there’s the correction term proportional to the function Ω. Owen Maresh has plotted this function:

So, I’m thinking that the slow rise at right here:

might not be an artifact of numerical approximations, but an actual real thing: this function Ω.

So: what’s the really neat way to write an ‘explicit formula’ relating prime powers and Riemann zeta zeros… which simplifies in some way iff the Riemann Hypothesis holds? And is there a way to modify Freeman Dyson’s claim here, that makes it correct while still maintaining a connection to quasicrystals?

If the Riemann Hypothesis is true, then the zeros of the zeta-function form a one-dimensional quasicrystal according to the definition. They constitute a distribution of point masses on a straight line, and their Fourier transform is likewise a distribution of point masses, one at each of the logarithms of ordinary prime numbers and prime-power numbers.

Sean CarrollExplanations: God vs. Nature

Earlier this year I participated in an Oxford workshop entitled Is God Explanatory? (Not really, as it turns out.) The videos for the conference are now available, including a couple by me:

(Traveling like crazy, as I’ve been doing this last year, doesn’t leave much time for real blogging, but at least there is an extensive video trail I can use for content.) Other speakers included Lara Buchak, William Stoeger, Joe Silk, and John Hawthorne. See the playlist for videos of the question-and-answer sessions, which were posted separately.

Those are formal talks with power point and the whole bit. But I also participated in dueling after-dinner talks at the conference banquet; that was fun, so I’ll post the video here. First we have Keith Ward, talking about God as Explanation, then me, talking about Nature as Explanation.

Note that I haven’t actually watched the video myself. And both speakers, if I recall correctly, had imbibed a glass of wine or two. So who knows what we actually said?

In cases like this, the chance that people will respond to the actual contents of the videos rather than just the titles is vanishingly small, but I remain defiantly optimistic in the face of overwhelming evidence.

Share

Chad OrzelGraduation 2013

The other big event of the weekend was Commencement at Union. I didn’t make it in time for the academic procession and all that, but I did hear John Lewis’s speech, which was great. More importantly, though, I was there to see our students graduate, and congratulate them in person.

As I told my thesis student, I’m not always the best about praise and positive reinforcement– I tend to react to progress in the lab with “That’s great. Now, the next thing to do is..” But this year’s class was a good bunch of students, and it’s been a pleasure to work with them over the last four years.

So, congratulations to (from left to right in the photo above) Mark, Christine, Pavel, Adam, Colin, and Halley. It’s been fun having you around, and best wishes for success in your future careers.

BackreactionNordita’s First Workshop for Science Writers, Summary

Patrick Sutton
George and I came up with the idea for this workshop one year ago at a reception of an earlier Nordita workshop. Yes, alcohol was involved. We talked about how science writers often feel like they’re running on a treadmill, having to keep up with the frenetic pace of publishing, only seldom getting a chance to take a few days off to gain some broader perspective. And we talked about how researchers too are running on a treadmill, having to keep up with the pace of their colleagues’ publications, and often feel that science writers miss the broader perspective.

And so we set ourselves the goal to get everybody off the treadmill for a few days.

Our “workshop for science writes”, which took place May 27-29, was devised for both, the writers and the physicists: For the writers to hear what topics in astrophysics and cosmology will soon be on the agenda and what science journalists really need to know about them. And for the physicists to share both their knowledge and their motivation, and to caution against common misunderstandings.

We modeled the workshop on “boot camps” organized by the Space Telescope Science Institute, Woods Hole Oceanographic Institute, U.C. Santa Cruz, and other institutions. Our workshop was a very intense and tightly packed meeting, with lectures by experts on selected topics in astrophysics and cosmology, followed by question and answer sessions.

George, wired.
On Tuesday afternoon, we visited the phonetics lab at Stockholm University, which was a fun excursion into a totally different area of science. At the lab, participants could analyze their voice spectra and airflow during speech, and learn the physics behind speech production. They could also take an EEG, which the researchers at the lab use to study which brain areas are involved in language processing and how that changes during infancy.

On Tuesday evening, one of the participants of the workshop, Robert Nemiroff, gave a public lecture at CosmoNova. The fully booked lecture took the audience on a tour through the solar system and beyond, projected on the 17m IMAX screen, while Robert explained the science behind the amazing photos and videos. Besides the stunning images, it was also great to see so many people interested in the laws of physics that shape our universe. (The guy sitting next to me held a copy of Lee Smolin’s new book on his lap which caused me some cognitive dissonance though.)

It was admittedly quite an organizational challenge to find the right level of technical details for an audience that physicists rarely deal with. I think however that the question and answer sessions as well as a large number of breaks were useful for participants to talk to lectures individually. We also had many interesting discussions about the tension between scientific accuracy and popular science writing. As you can guess, I inevitably come down on the side of scientific accuracy.

George turned out to be an excellent organizer, though clearly not used to the physicists compulsive ignorance of deadlines and reminders. I found it quite interesting that when I sent out mass emails to the participants that asked for reply, the first cohort of replies would come almost exclusively from the science writers, frequently within minutes. Among the physicists there were but two who'd answer within 24 hours and meet the deadlines, the rest waited for multiple reminders. The other interesting contrast was that the science writers were considerably more comfortable and engaged with social media.

For me, it was a great pleasure to get to know such an interesting and diverse group of people. I’m neither an astrophysicist nor a cosmologist nor a science writer, and I learned a lot at this workshop - it will probably inspire some more blogposts.

You can find soundbites and links from the meeting on twitter here, and slides of the lectures here.

George Musser, Robert Nemiroff, I, and a bunch of beautiful flowers.

BackreactionQuantum gravity phenomenology \neq detecting gravitons

First direct evidence for gravitons.
I’ve never met Freeman Dyson, but I’ve argued with him many times.

Almost every time I give a seminar about my research field, the phenomenology of quantum gravity, I find myself in the bizarre situation of first having to convince the audience that it is a research field. And that even though hundreds of people work on it. I have been organizing and co-organizing a series of conferences on Experimental Search for Quantum Gravity, and in each installment we had to turn away applicants due to space limitations. The arXiv is full with papers on the topic, more than I can keep up with on this blog, and it’s in the popular press more often than I’d like*. Why are my fellow physicists so slow to notice? I make Freeman Dyson responsible for this.

Dyson has popularized the idea that quantum gravity is inaccessible to experiment and thereby discouraged studies of phenomenological consequences of quantum gravity. In a 2004 review of Brian Greene’s book “The Fabric of the Cosmos” he wrote:
“According to my hypothesis [...] the two theories [general relativity and quantum theory] are mathematically different and cannot be applied simultaneously. But no inconsistency can arise from using both theories, because any differences between their predictions are physically undetectable.”
And in a 2012 essay for the Edge Annual Question, he still pushed the idea of quantum gravitational effects being unobservable:
“I propose as a hypothesis... that single gravitons may be unobservable by any conceivable apparatus. If this hypothesis were true, it would imply that theories of quantum gravity are untestable and scientifically meaningless. The classical universe and the quantum universe could then live together in peaceful coexistence. No incompatibility between the two pictures could ever be demonstrated. Both pictures of the universe could be true, and the search for a unified theory could turn out to be an illusion.”
The problem with this argument is that he equates the observation of a single graviton with evidence for a quantization of gravity. But the two are not the same. If single gravitons were unobservable, it would not imply that “theories of quantum gravity are untestable and scientifically meaningless.”

It might indeed be that we will never be able to detect gravitons. One can estimate the probability of detecting gravitons and even with extremely futuristic detectors the size of Jupiter put in orbit around a Newton star, chances would be slim. (See this paper for estimates.) Clearly not an experiment you want to write a grant proposal for.

But we don’t need to detect single gravitons to find experimental evidence for quantum gravity.

Look around. The fact that atoms are stable is evidence for the quantization of the electromagnetic interaction. You don’t need to detect single photons for that. You also don’t need to resolve atomic structures to find evidence for the atomic theory. Brownian motion famously provided this evidence, visible by eye. And Planck introduced what is now known as “Planck’s constant” before Einstein’s Nobel-prize winning explanation for the photoelectric effect.

If we pay attention to the history of physics, it is thus plausible that we can find evidence for quantum gravity without directly detecting gravitons. The quantum theory of gravity might have consequences that we can access in regimes where gravity is weak, as long as we ask the right questions.

Some people have a linguistic problem with calling something a “quantum gravitational effect” if it isn’t actually an effect that directly involves quanta of the gravitational field. This is why I instead often use the expression “Planck scale effects” to refer to effects beyond the standard model that might be signatures of quantum gravity.

Interestingly, Christine recently pointed me to a writeup of a 2012 talk by Freeman Dyson, in which he discusses the possibility of detecting gravitons without jumping to the conclusion that an inability to detect gravitons means that quantum gravity is a subject for philosophers. Instead, Dyson is very careful with stating:
“One hypothesis is that gravity is a quantum field and gravitons exist. A second hypothesis is that the gravitational field is a statistical concept like entropy or temperature, only defined for gravitational effects of matter in bulk and not for effects of individual elementary particles… If a graviton detector is in principle impossible, then both hypotheses remain open.”
A hooray for Dyson!

Unfortunately, there are still other people barking up the same tree, for example by pulling the accelerator argument. For example John Horgan writes:
“String theory, loop-space theory and other popular candidates for a unified theory postulate phenomena far too minuscule to be detected by any existing or even conceivable (except in a sci-fi way) experiment. Obtaining the kind of evidence of a string or loop that we have for, say, the top quark would require building an accelerator as big as the Milky Way.”
Horgan is well known for proclaiming The End of Science, and it seems indeed he’s run out of science when he wrote the above. To begin with, string theory doesn’t “postulate... phenomena,” what would be the point of doing this? It postulates, drums please, strings. And I’m not at all sure what “loop-space theory” is supposed to be. But leaving aside this demonstration of Hogan’s somewhat fuzzy understanding of the subject, if we could build a detector the size of the Milky Way, we’d be able to test very high energies, all right. But that doesn’t mean we can conclude this is the only way to find evidence for quantum gravity.

Luckily Horgan has colleagues who think before they write, like George Musser who put it this way:
“[Q]uantum gravity” and “experiment” are… like peanut butter and chocolate. They actually go together quite tastily.
(I had meant to write a summary of which possible experiments for quantum gravity pheno are presently being discussed and how plausible I think they are to deliver results, but I got distracted by Dyson’s above mentioned paper on graviton detection. The summary will follow some other time. Update: The summary is here.)

*Almost everything I read in the popular press about evidence for quantum gravity is wrong or misleading or both. But then you already knew I would complain about this :p

Chad OrzelHall of Fame

This past weekend was more complicated than it might’ve been. On Friday night, we drove to Whitney Point to my parents’ house, then on Sunday morning very early we drove back to Niskayuna so I could make it to Union’s graduation on Sunday (I arrived just in time to hear Civil Rights icon John Lewis give the main commencement address, an excellent speech).

The reason for all this driving around was that on Saturday evening, I was inducted into the Whitney Point Central School District Hall of Fame. This is, quite literally, a hall, run ning from the front lobby of the high school to the cafeteria (more or less):

The Whitney Point Central School District Hall of Fame

The Whitney Point Central School District Hall of Fame

It’s also very new– this was the third year– but I think it’s a great idea, and I’m deeply honored to have been included. They’ve got three categories– alumni, school employees, and community members– and include people who have made significant contributions to the community or their chosen profession– in my case, science and science communication. The other honorees this year were Dr. Dan Driscoll, who graduated from WP in the 60′s and came back after medical school to start a practice in town; Marv and Alice Gregg who ran the main grocery store in town; Barb Quarella who taught basically every kid in town my age and younger to swim; and Iva Jean Marsh Tennant, who is an alumna from the 60′s and a distinguished math teacher at another school in the area.

Like I said in my thank-you speech, it’s an honor to be included with people who have been such an integral part of the community for so long. The school has meant a huge amount to me and my family– not only did my sister and I and two cousins graduate from there, but both my parents worked in the district (my father taught sixth grade for 30-odd years, my mother was the elementary school librarian for several years before moving to the local BOCES), my aunt and uncle worked in the district (my uncle taught social studies in the high school, my aunt taught in the elementary school), and I still have cousins in town working with the school.

There’s a running joke in my family about my maternal grandmother claiming to have a plaque in her honor in the long since demolished public school she attended in the Bronx. That adds a little extra amusement to the fact that I really do have a plaque in my honor in in the high school I attended (pictured at the top of this post; the citation is very long, so I’m not going to transcribe it, but it will presumably be added to the web page at some point). But mostly, as I said, I’m honored to be included in a community that’s meant so much to me over the years. And if having my picture and bio up there helps another kid from the area to consider a broader range of career possibilities, including even SCIENCE!, then that will be more than worth the additional travel on a busy weekend.

Andrew JaffeThe next generation of large satellites: PRISM and/or eLISA?

Today was the deadline for submitting so-called “White Papers” proposing the next generation of the European Space Agency satellite missions. Because of the long lead times for these sorts of complicated technical achievements, this call is for launches in the faraway years of 2028 or 2034. (These dates would be harder to wrap my head around if I weren’t writing this on the same weekend that I’m attending the 25th reunion of my university graduation, an event about which it’s difficult to avoid the clichéd thought that May, 1988 feels like the day before yesterday.)

At least two of the ideas are particularly close to my scientific heart.

The Polarized Radiation Imaging and Spectroscopy Mission (PRISM) is a cosmic microwave background (CMB) telescope, following on from Planck and the current generation of sub-orbital telescopes like EBEX and PolarBear: whereas Planck has 72 detectors observing the sky over nine frequencies on the sky, PRISM would have more than 7000 detectors working in a similar way to Planck over 32 frequencies, along with another set observing 300 narrow frequency bands, and another instrument dedicated to measuring the spectrum of the CMB in even more detail. Combined, these instruments allow a wide variety of cosmological and astrophysical goals, concentrating on more direct observations of early Universe physics than possible with current instruments, in particular the possible background of gravitational waves from inflation, and the small correlations induced by the physics of inflation and other physical processes in the history of the Universe.

The eLISA mission is the latest attempt to build a gravitational radiation observatory in space, observing astrophysical sources rather than the primordial background affecting the CMB, using giant lasers to measure the distance between three separate free-floating satellites a million kilometres apart from one another. As a gravitational wave passes through the triangle, it bends space and effectively changes the distance between them. The trio would thereby be sensitive to the gravitational waves produced by small, dense objects orbiting one another, objects like white dwarfs, neutron stars and, most excitingly, black holes. This would give us a probe of physics in locations we can’t see with ordinary light, and in regimes that we can’t reproduce on earth or anywhere nearby.

In the selection process, ESA is supposed to take into account the interests of the community. Hence both of these missions are soliciting support, of active and interested scientists and also the more general public: check out the sites for PRISM and eLISA. It’s a tough call. Both cases would be more convincing with a detection of gravitational radiation in their respective regimes, but the process requires putting down a marker early on. In the long term, a CMB mission like PRISM seems inevitable — there are unlikely to be any technical showstoppers — it’s just a big telescope in a slightly unusual range of frequencies. eLISA is more technically challenging: the LISA Pathfinder effort has shown just how hard it is to keep and monitor a free-floating mass in space, and the lack of a detection so far from the ground-based LIGO observatory, although completely consistent with expectations, has kept the community’s enthusiasm lower. (This will likely change with Advanced LIGO, expected to see many hundreds of sources as soon as it comes online in 2015 or thereabouts.)

Full disclosure: although I’ve signed up to support both, I’m directly involved in the PRISM white paper.

Peter RohdeMy second attempt at stand-up comedy

Live at the Stazione Espesso Bar, Sydney.

Related posts:

  1. My first attempt at stand-up comedy Live at the Stazione Espresso Bar, Sydney....

Related posts brought to you by Yet Another Related Posts Plugin.

Matt StrasslerWho Learns the Most in a Science Class?

I’m back, after two weeks of teaching non-experts in a short course covering particle physics, the Higgs field, and the discovery of the Higgs particle.  (The last third of the course, on the politics and funding of particle physics and science more broadly, is wisely being taught by a more disinterested party, an economist with some undergraduate physics background.)  And I’ve been reminded: One of the great joys (and great secrets) of teaching is that the teacher always learns more than the students do.

At least, this is generally true for a new class that the teacher hasn’t taught before. In many university physics departments, and elsewhere, there is an informal requirement that professors teach a class no more than three years in a row. [Let us ignore for the moment that all of this will be overturned in the coming years by the on-line revolution; we can discuss the possible consequences later.] After the third year, they are expected to switch and teach something else. Now you might think that the benefits of the division of labor would suggest a different approach; after all, shouldn’t each professor perfect a course, become the expert, and teach it year in, year out? This usually doesn’t work (though there are exceptions) because each professor’s interaction with a new course has a natural life cycle.

The first time a professor teaches a course, he or she has to review material learned long ago, and sometimes learn it anew. You might think this is only difficult for advanced classes, but that’s not the case. Advanced classes can indeed be difficult to teach because the material is intrinsically complex. But beginning classes are also difficult to teach because the material, while less complex, is just as complex for the students, and meanwhile the teacher has to remember how to think like a beginner, which is an experience long forgotten. If you can’t put yourself into your students’ heads, you can’t teach them very effectively… and this is extremely challenging.

Typically, this initial year of a course is quite exhausting, with the professor spending many more hours in preparation than in class. (I personally found it typical to spend 3 to 5 hours of preparation for each hour in front of students.) But the benefits are also considerable. I have often found myself learning several different ways of explaining a concept — my students only get to learn one, because that’s all the time we have, but I learn them all. And along the way I’ve often discovered links between disparate concepts that I hadn’t previously realized were related, or filled in a surprising gap in my understanding, or learned an application of a concept to a real-world phenomenon that I hadn’t previously known about. Often my students don’t get the immediate benefit because what I’ve learned is beyond the scope of the course. So they struggle to learn some fraction of what I teach them, which is typically much less than what I’ve learned myself.

Year two in the life cycle is the opportunity to fix everything that went wrong in year one, and it is usually the best year. I have usually found myself completely rewriting the first year’s notes, streamlining them, re-ordering them, and improving everything from the overall course structure to the details of how I explain certain subtle points. And I find I still learn quite a bit in the process, particularly about little loose ends that I didn’t have time to tie up during the frantic class preparations of the previous go-round.

But by year three, the whole thing is becoming routine. There isn’t much left for the professor to learn about the class material, and the struggle to master the content and perfect the presentation is no longer so severe. Sure, there’s always more that can be done to help the students deal with the technical material more effectively, but diminishing returns are setting in; any particularly creative ideas for how to convey the most problematic concepts have probably already appeared. So year three is not quite at the point of boredom… but beware year four.  And you do not want to be taught anything by a bored professor.

And so the professor is sent on to begin the cycle anew, to re-learn another subject, and to struggle to find the words and means to explain it clearly.

Two years ago, when I first wrote the Higgs FAQ (here’s the old 1.0 version and the new 2.0 version) I didn’t do a very complete or satisfying job of explaining the most important conceptual issues in particle physics: what are particles and what are fields? One really can’t understand modern physics, and the current notion of what ordinary matter actually is, what forces are, what mass is, etc., without these basic concepts. I promised a full article on the matter, but didn’t really deliver. I had done something brief in the my Secret Science Club presentation, but I didn’t feel it was as good as I wanted.

Then a bit under a year ago, having learned a great deal from writing articles for this site, from encountering and attempting to answer readers’ questions, and from preparing and delivering a number of public lectures on the search for the Higgs particle, I wrote a set of articles entitled “Fields and Their Particles (with math)” and “How the Higgs Field Works (with math)“, which boiled the issues down to the point that they could be understood by a first-year undergraduate college physics student. At the time, I promised a set of articles without the math, yet found myself unable to find the right strategy to do it. In particular, I did not want to make compromises that would require me to lie. Sure, some amount of compromise is necessary when explaining a difficult concept to someone who’s never seen anything like it. But that shouldn’t go as far as telling someone something they will later have to un-learn, or that will confuse them because it is actually false. (For instance: saying “the Higgs field is like molasses; it gives mass to things by slowing them down” is a lie. [As one of my students pointed out last week, you'll find this lie right at the top of
http://simple.wikipedia.org/wiki/Higgs_field
.] Molasses exerts drag, like air resistance; the Higgs field’s effect is obviously not a form of drag, because it affects stationary objects as well as moving ones.)

I believe I finally found a workable strategy while preparing a one-hour public lecture this March, and in preparing my recent course, which I described to the students as “non-technical” (i.e. no use of math except very simple and conceptual equations, such as E = m c² and E = h f) “but sophisticated” (i.e., you can’t tell the truth and yet make everything as simple and effortless as breathing).  In four 9o-minute lectures, I managed to describe the known particles and forces, explain how both arise from fields (which are the fundamental ingredients of nature), clarify what particles really are and what a particle’s mass really means, and finally, explain how the Higgs field gives mass to particles; then it wasn’t that hard to explain how the Higgs particle was discovered and how that discovery convinces us the Higgs field really exists.  And I don’t believe I lied once, though I should check that…

As usual, my students’ questions taught me a lot about all of the issues I’d forgotten to explain, and about little loose ends in my presentation that I hadn’t tied off.  Though they may not realize it, I owe them a big thank you!  Because now, given what I learned from preparing and teaching the course, I think I can finally begin planning the promised set of articles: “Fields and Their Particles (without math)”, to be followed by “How the Higgs Field Works (without math)”.

And meanwhile I hope I’ll get to teach that course again soon — since the second time around is when you fix what you messed up the first time…


Filed under: Higgs, LHC Background Info, Particle Physics, Public Outreach Tagged: DoingScience, Higgs, LHC, particle physics, PublicTalks

June 16, 2013

n-Category Café In the News

Applications of category theory are described by Julie Rehmeyer in ScienceNews under the banner

One of the most abstract fields in math finds application in the ‘real’ world.

Now, how about applications in the real world?

Tim GowersThe selected-papers network

This post is to report briefly on a new and to my mind very exciting venture in academic publishing. It’s called the Selected Papers Network, and it has been designed and created by Christopher Lee. If you want to know what it is and what you can do to help it become a success, then you may wish to stop reading this post and turn straight away to a post by John Baez, who has been closely involved with the venture and understands it better than I do. But let me just briefly mention the main point that has struck me so far.

A problem with the current situation is that it is easy to come up with ideas for websites where people can review papers, complete with clever protocols for how the reviewing should take place, whether it is open, reward systems, etc. etc. It’s much less easy to persuade people to use the sites that are created as a result: what is going to persuade them to make the effort, when there’s only rather a small chance that the site will become in any sense “official”?

The Selected Papers Network potentially solves this problem in a very interesting way: it is not a website with a system for reviewing, evaluating, rewarding etc.. Rather, it is an environment that makes it easy to build your own systems.

The rough idea is this. If you ever feel moved to write an appreciation or evaluation of any kind you like about any paper, and if you tag what you have written with #spnetwork, then the Selected Papers Network automatically sees what you have written and adds it to the network. So what, you might ask. Well, so quite a lot actually, since if you add other tags, then the Selected Papers Network will make all these reviews searchable in a multitude of ways. For example, it will be searchable by subject matter, or by reviewer, or by some group of reviewers who have decided to club together, etc. etc. I’m slightly hazy about the precise mechanisms for this — and there I would definitely recommend reading John Baez’s post — but the point is that the mechanisms are there.

At the moment, the site works only with Google Plus posts. They were chosen because (i) a number of mathematicians have taken to Google Plus, and (ii) Google Plus posts are easily visible even to people who are not signed up to Google Plus. To give you an idea of what the site does, here is a suitably tagged post by Terence Tao on Google Plus, and here is what the Selected Papers Network did to it. And here is the front page of the Selected Papers Network. You’ll notice that the design is strikingly similar to that of the arXiv. That is of course not a coincidence.

It seems to me that a very good thing to do at this point would be to get a lot of content on the site and not worry too much about how that content is organized. To that end, I have a plan to create a personal list of recommendations. Each recommendation would take the form of a Google Plus post that links to a paper that I feel has influenced my mathematical development and explains why. I can see that as this develops (if it does, but I would like to try to make the posts of a kind that I can write fairly quickly, to maximize the probability that it will), I might start to want to categorize the posts in a finer way. For example, perhaps some of the papers will be ones that have interested me greatly but not actually affected my research all that much, perhaps I’ll want to distinguish between very recently arXived papers and older ones, etc. etc. But I still think that it would be good to get the process started.

Another advantage of getting content up on the site quickly is that one can always reorganize it later. For example, suppose that a number of people with similar interests to mine started recommending papers. Then perhaps it would make sense to combine into a “subnetwork”. It’s easy to imagine that being a very useful resource — a kind of instant annotated reading list in one area of mathematics.

My personal policy for my “reviews”, or whatever one wants to call them, is this. I won’t write about papers that don’t interest me, because I see my job as being positive and encouraging. For a similar reason, I won’t make comparisons, either explicitly or implicitly (by the latter I mean via some kind of ranking, which would allow my attitudes to different papers to be compared). I will explain my reasons for being enthusiastic about the papers I write about, which may allow some kind of reading between the lines if you want to gauge my level of enthusiasm — but that will be an imprecise measurement and therefore I hope that I won’t come across as implicitly negative about any papers I write about, which would be ridiculous because I’ll be writing about them because I find them interesting.

If I don’t write about a paper, it won’t be a sign that I don’t find it interesting. I think I’ll ration myself to at most one a week, so for a long time I won’t have written enough reviews for it to be possible to interpret my not having written about a paper as meaning anything at all. Nor will the order in which I write about papers carry any information: I’ll just sit down and think, “Oh yes, that’s a nice paper. Let me write a few paragraphs about that one.”

One thing I’d like to see is a day when if somebody applies for a job in mathematics, there will typically be several appreciations of their work instantly available online, conveying information that is currently conveyed by reference letters (but not all that information — in particular, not comparisons or negative opinions). Or if you want to know about a particular area and want a feel for what the key papers are and why they are important, you can find all the information you could possibly want by keying in subject tags into spnetwork, or reading the recommendations of your favourite reviewers or groups of reviewers.

While writing this, I’ve thought of another way of explaining what will govern my selections. I have a private concept that I call “the story of mathematics”, though it might be more accurate to say “the stories of mathematics”. A paper belongs to the story of some part of mathematics if it would be very natural to mention that paper when describing that part of mathematics, since for one reason or another that paper changed people’s view of the area — by introducing a key definition, solving an important problem, introducing a technique that is now widely used, etc. etc. This is a matter of degree of course. But it’s papers like that — papers that have contributed to the stories of mathematics that have had an impact on me — that I want to write about.

Let me end with a disclaimer: my saying all this is not a guarantee that I’ll actually get round to doing it. However, it increases the probability that I will, since it will be slightly embarrassing to have written this post and followed it by not writing any reviews.

But the other reason for writing the post is that I hope it will encourage others to do similar things: even if 1000 mathematicians each wrote just one review, that would already create a site worth exploring, and in principle it could happen very quickly. I’m also quite pleased to have a chance to advertise Google Plus. If you’re a Facebook kind of person who thinks Google Plus is a rather lame imitation of Facebook that was introduced too late to be worth taking seriously, then you may, if you’re a mathematician, like to think again. I’m not on Facebook myself, but my impression is that Google Plus, or at least the corner of Google Plus that is inhabited by mathematicians and others with similar interests, is doing something interestingly different. I personally see it as a good place for short posts that aren’t serious enough for this blog, but are nevertheless things I feel like saying, or drawing attention to. And several other people are using it in a similar way. So if you’re a mathematician without a Google Plus account, you may be missing out on something you’d like. And that may become even more true if the Selected Papers Network takes off, though — and this is important to stress — whether or not you sign on to Google Plus or the Selected Papers Network itself, you’ll be able to read all the reviews.


Terence TaoFurther analysis of the truncated GPY sieve

This post is a continuation of the previous post on sieve theory, which is an ongoing part of the Polymath8 project. As the previous post was getting somewhat full, we are rolling the thread over to the current post. We also take the opportunity to correct some errors in the treatment of the truncated GPY sieve from this previous post.

As usual, we let {x} be a large asymptotic parameter, and {w} a sufficiently slowly growing function of {x}. Let {0 < \varpi < 1/4} and {0 < \delta < 1/4+\varpi} be such that {MPZ[\varpi,\delta]} holds (see this previous post for a definition of this assertion). We let {{\mathcal H}} be a fixed admissible {k_0}-tuple, let {I := [w,x^\delta]}, let {{\mathcal S}_I} be the square-free numbers with prime divisors in {I}, and consider the truncated GPY sieve

\displaystyle  \nu(n) := \lambda(n)^2

where

\displaystyle  \lambda(n) := \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) g(\frac{\log d}{\log R})

where {R := x^{1/4+\varpi}}, {P} is the polynomial

\displaystyle  P(n) := \prod_{h \in {\mathcal H}} (n+h),

and {g: {\bf R} \rightarrow {\bf R}} is a fixed smooth function supported on {[-1,1]}. As discussed in the previous post, we are interested in obtaining an upper bound of the form

\displaystyle  \sum_{x \leq n \leq 2x} \nu(n) \leq (\alpha+o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0} R}

as well as a lower bound of the form

\displaystyle  \sum_{x \leq n \leq 2x} \nu(n) \theta(n+h) \geq (\beta+o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0-1} R}

for all {h \in {\mathcal H}} (where {\theta(n) = \log n} when {n} is prime and {\theta(n)=0} otherwise), since this will give the conjecture {DHL[k_0,2]} (i.e. infinitely many prime gaps of size at most {k_0}) whenever

\displaystyle  1+4\varpi > \frac{4\alpha}{k_0 \beta}. \ \ \ \ \ (1)

It turns out we in fact have precise asymptotics

\displaystyle  \sum_{x \leq n \leq 2x} \nu(n) = (\alpha+o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0} R} \ \ \ \ \ (2)

and

\displaystyle  \sum_{x \leq n \leq 2x} \nu(n) \theta(n+h) = (\beta+o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0} R} \ \ \ \ \ (3)

although the exact formulae for {\alpha,\beta} are a little complicated. (The fact that {\alpha,\beta} could be computed exactly was already anticipated in Zhang’s paper; see the remark on page 24.) We proceed as in the previous post. Indeed, from the arguments in that post, (2) is equivalent to

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]} \ \ \ \ \ (4)

\displaystyle  = (\alpha + o(1)) (\frac{W}{\phi(W)})^{k_0} \log^{-k_0} R

and (3) is similarly equivalent to

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \frac{(k_0-1)^{\Omega([d_1,d_2])}}{[d_1,d_2]} \ \ \ \ \ (5)

\displaystyle  = (\beta + o(1)) (\frac{W}{\phi(W)})^{k_0-1} \log^{-k_0+1} R.

Here {\Omega(d)} is the number of prime factors of {d}.

We will work for now with (4), as the treatment of (5) is almost identical.

We would now like to replace the truncated interval {I = [w,x^\delta]} with the untruncated interval {I \cup J = [w,\infty)}, where {J = (x^\delta,\infty)}. Unfortunately this replacement was not quite done correctly in the previous post, and this will now be corrected here. We first observe that if {F(d_1,d_2)} is any finitely supported function, then by Möbius inversion we have

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} F(d_1,d_2) = \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} F(d_1,d_2) \sum_{a \in {\mathcal S}_J} \mu(a) 1_{a|[d_1,d_2]}.

Note that {a|[d_1,d_2]} if and only if we have a factorisation {d_1 = a_1 d'_1}, {d_2 = a_2 d'_2} with {[a_1,a_2] = a} and {d'_1 d'_2} coprime to {a_1 a_2}, and that this factorisation is unique. From this, we see that we may rearrange the previous expression as

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \mu( [a_1,a_2] ) \sum_{d'_1,d'_2 \in {\mathcal S}_{I \cup J}: (d'_1 d'_2, a_1 a_2) = 1} F( a_1 d'_1, a_2 d'_2 ).

Applying this to (4), and relabeling {d'_1,d'_2} as {d_1,d_2}, we conclude that the left-hand side of (4) is equal to

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \mu( [a_1,a_2] ) \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}: (d_1d_2,a_1a_2)=1}

\displaystyle \mu(a_1d_1) g(\frac{\log a_1d_1}{\log R}) \mu(a_2d_2) g(\frac{\log a_2d_2}{\log R}) \frac{k_0^{\Omega([a_1 d_1,a_2 d_2])}}{[a_1 d_1,a_2 d_2]}

which may be rearranged as

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{\mu( (a_1,a_2) ) k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \sum_{d_1,d_2\in {\mathcal S}_{I \cup J}: (d_1d_2,a_1a_2)=1} \ \ \ \ \ (6)

\displaystyle  \mu(d_1) g(\frac{\log a_1d_1}{\log R}) \mu(d_2) g(\frac{\log a_1 d_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1, d_2]}.

This is almost the same formula that we had in the previous post, except that the Möbius function {\mu((a_1,a_2))} of the greatest common divisor {(a_1,a_2)} of {a_1,a_2} was missing, and also the coprimality condition {(d_1d_2,a_1a_2)=1} was not handled properly in the previous post.

We may now eliminate the condition {(d_1d_2,a_1a_2)=1} as follows. Suppose that there is a prime {p_* \in J} that divides both {d_1d_2} and {a_1a_2}. The expression

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}: p_* | (d_1d_2,a_1a_2)}

\displaystyle |g(\frac{\log a_1d_1}{\log R})| |g(\frac{\log a_1 d_2}{\log R})| \frac{k_0^{\Omega([d_1,d_2])}}{[d_1, d_2]}

can then be bounded by

\displaystyle  \ll \sum_{a_1,a_2} \sum_{d_1,d_2: p_* | (d_1d_2,a_1a_2)} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \frac{k_0^{\Omega([d_1,d_2])}}{[d_1, d_2]} (a_1 a_2 d_1 d_2)^{-1/\log R}

which may be factorised as

\displaystyle  \ll \frac{1}{p_*^2} \prod_p (1 + \frac{O(1)}{p^{1+1/\log R}})

which by Mertens’ theorem (or the simple pole of {\zeta(s)} at {s=1}) is

\displaystyle  \ll \frac{\log^{O(1)} R}{p_*^2}.

Summing over all {p_* > x^\varpi} gives a negligible contribution to (6) for the purposes of (4). Thus we may effectively replace (6) by

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{\mu( (a_1,a_2) ) k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \sum_{d_1,d_2\in {\mathcal S}_{I \cup J}}

\displaystyle \mu(d_1) g(\frac{\log a_1d_1}{\log R}) \mu(d_2) g(\frac{\log a_1 d_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1, d_2]}.

The inner summation can be treated using Proposition 10 of the previous post. We can then reduce (4) to

\displaystyle  \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{\mu( (a_1,a_2) ) k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} G_{k_0}( \frac{\log a_1}{\log R}, \frac{\log a_2}{\log R} ) = \alpha+o(1) \ \ \ \ \ (7)

where {G_{k_0}} is the function

\displaystyle  G_{k_0}(t_1,t_2) := \int_0^1 g^{(k_0)}(t+t_1) g^{(k_0)}(t+t_2) \frac{t^{k_0-1}}{(k_0-1)!}\ dt.

Note that {G} vanishes if {t_1 \geq 1} or {t_2 \geq 1}. In practice, we will work with functions {g} in which {g^{(k_0)}} has a definite sign (in our normalisations, {g^{(k_0)}} will be non-positive), making {G_{k_0}} non-negative.

We rewrite the left-hand side of (7) as

\displaystyle  \sum_{a \in {\mathcal S}_J} \frac{k_0^{\Omega(a)}}{a} \sum_{a_1,a_2: [a_1,a_2] = a} \mu((a_1,a_2)) G_{k_0}( \frac{\log a_1}{\log R}, \frac{\log a_2}{\log R} ).

We may factor {a = p_1 \ldots p_n} for some {x^\delta < p_1 < \ldots < p_n} with {p_1 \ldots p_n \leq R}; in particular, {n < \frac{1 + 4\varpi}{4\delta}}. The previous expression now becomes

\displaystyle  \sum_{0 \leq n < \frac{1+4\varpi}{4\delta}} k_0^n \sum_{x^\delta < p_1 < \ldots < p_n} \frac{1}{p_1 \ldots p_n}

\displaystyle  \sum_{\{1,\ldots,n\} = S \cup T} (-1)^{|S \cap T|} G_{k_0}( \sum_{i \in S} \frac{\log p_i}{\log R}, \sum_{j \in T} \frac{\log p_j}{\log R} ).

Using Mertens’ theorem, we thus conclude an exact formula for {\alpha}, and similarly for {\beta}:

Proposition 1 (Exact formula) We have

\displaystyle  \alpha = \sum_{0 \leq n < \frac{1+4\varpi}{4\delta}} k_0^n \int_{\frac{4\delta}{1+4\varpi} < t_1 < \ldots < t_n} G_{k_0,n}(t_1,\ldots,t_n) \frac{dt_1 \ldots dt_n}{t_1 \ldots t_n}

where

\displaystyle  G_{k_0,n}(t_1,\ldots,t_n) := \sum_{\{1,\ldots,n\} = S \cup T} (-1)^{|S \cap T|} G_{k_0}( \sum_{i \in S} t_i, \sum_{j \in T} t_j ).

Similarly we have

\displaystyle  \beta = \sum_{0 \leq n < \frac{1+4\varpi}{4\delta}} (k_0-1)^n \int_{\frac{4\delta}{1+4\varpi} < t_1 < \ldots < t_n} G_{k_0-1,n}(t_1,\ldots,t_n) \frac{dt_1 \ldots dt_n}{t_1 \ldots t_n}

where {G_{k_0-1}} and {G_{k_0-1,n}} are defined similarly to {G_{k_0}} and {G_{k_0,n}} by replacing all occurrences of {k_0} with {k_0-1}.

These formulae are unwieldy. However if we make some monotonicity hypotheses, namely that {g^{(k_0-1)}} is positive, {g^{(k_0)}} is negative, and {g^{(k_0+1)}} is positive on {[0,1)}, then we can get some good estimates on the {G_{k_0}, G_{k_0-1}} (which are now non-negative functions) and hence on {\alpha,\beta}. Namely, if {g^{(k_0)}} is negative but increasing then we have

\displaystyle  -g^{(k_0)}(t+t_1) \leq -g^{(k_0)}(\frac{t}{1-t_1})

for {0 \leq t_1 < 1} and {t \in [0,1]}, which implies that

\displaystyle  G_{k_0}(t_1,t_1) \leq (1-t_1)_+^{k_0} G_{k_0}(0,0)

for any {t_1 \geq 0}. A similar argument in fact gives

\displaystyle  G_{k_0}(t_1+t_2,t_1+t_2) \leq (1-t_1)_+^{k_0} G_{k_0}(t_2,t_2)

for any {t_1,t_2 \geq 0}. Iterating this we conclude that

\displaystyle  G_{k_0}(\sum_{i \in S} t_i, \sum_{i \in S} t_i) \leq (\prod_{i \in S} (1-t_i)_+^{k_0}) G_{k_0}(0,0)

and similarly

\displaystyle  G_{k_0}(\sum_{i \in T} t_i, \sum_{i \in T} t_i) \leq (\prod_{i \in T} (1-t_i)_+^{k_0}) G_{k_0}(0,0).

From Cauchy-Schwarz we thus have

\displaystyle  G_{k_0}( \sum_{i \in S} t_i, \sum_{i \in T} t_i ) \leq (\prod_{i=1}^n (1 - t_i)_+^{k_0/2}) G_{k_0}(0,0).

Observe from the binomial formula that of the {3^n} pairs {(S,T)} with {S \cup T = \{1,\ldots,n\}}, {\frac{3^n+1}{2}} of them have {|S \cap T|} even, and {\frac{3^n-1}{2}} of them have {|S \cap T|} odd. We thus have

\displaystyle  -\frac{3^n-1}{2} (\prod_{i=1}^n (1 - t_i)_+^{k_0/2}) G_{k_0}(0,0) \leq G_{k_0,n}(t_1,\ldots,t_n) \ \ \ \ \ (8)

\displaystyle  \leq \frac{3^n+1}{2} (\prod_{i=1}^n (1 - t_i)_+^{k_0/2}) G_{k_0}(0,0).

We have thus established the upper bound

\displaystyle  \alpha \leq G_{k_0}(0,0) (1 + \kappa) \ \ \ \ \ (9)

where

\displaystyle  \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n+1}{2} k_0^n \int_{\frac{4\delta}{1+4\varpi} < t_1 < \ldots < t_n} (\prod_{i=1}^n (1 - t_i)_+^{k_0/2}) \frac{dt_1 \ldots dt_n}{t_1 \ldots t_n}.

By symmetry we may factorise

\displaystyle  \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n+1}{2} \frac{k_0^n}{n!} ( \int_{\frac{4\delta}{1+4\varpi} < t \leq 1} (1-t)^{k_0/2}\ \frac{dt}{t})^n.

The expression {\kappa} is explicitly computable for any given {\varpi,\delta,k_0}. Following the recent preprint of Pintz, one can get a slightly looser, but cleaner, bound by using the upper bound

\displaystyle  1-t \leq \exp(-t)

and so

\displaystyle  \kappa \leq \sum_{1 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n+1}{2} \frac{k_0^n}{n!} (\int_{4\delta/(1+4\varpi)}^\infty \exp( - \frac{k_0}{2} t )\ \frac{dt}{t})^n.

Note that

\displaystyle  \int_{4\delta/(1+4\varpi)}^\infty \exp( - \frac{k_0}{2} t )\ \frac{dt}{t} = \int_1^\infty \exp( - \frac{2k_0 \delta}{1+4\varpi} t)\ \frac{dt}{t}

\displaystyle < \int_1^\infty \exp( - \frac{2k_0 \delta}{1+4\varpi} t)\ dt

\displaystyle  = \frac{1+4\varpi}{2k_0\delta} \exp( - \frac{2k_0 \delta}{1+4\varpi} )

and hence

\displaystyle \kappa \leq \tilde \kappa

where

\displaystyle  \tilde \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{4\delta}} \frac{1}{n!} \frac{3^n+1}{2} (\frac{1+4\varpi}{2\delta} \exp( - \frac{2k_0 \delta}{1+4\varpi} ))^n.

In practice we expect the {n=1} term to dominate, thus we have the heuristic approximation

\displaystyle  \kappa \lessapprox \frac{1+4\varpi}{\delta} \exp( - \frac{2k_0 \delta}{1+4\varpi} ).

Now we turn to the estimation of {\beta}. We have an analogue of (8), namely

\displaystyle  -\frac{3^n-1}{2} (\prod_{i=1}^n (1-t_i)^{(k_0-1)/2}) G_{k_0-1}(0,0) \leq G_{k_0-1,n}(t_1,\ldots,t_n)

\displaystyle  \leq \frac{3^n+1}{2} (\prod_{i=1}^n (1-t_i)^{(k_0-1)/2}) G_{k_0-1}(0,0).

But we have an improvment in the lower bound in the {n=1} case, because in this case we have

\displaystyle  G_{k_0-1,n}(t) = G_{k_0-1}(t,0) + G_{k_0-1}(0,t) - G_{k_0-1}(t,t).

From the positive decreasing nature of {g^{(k_0-1)}} we see that {G_{k_0-1}(t,t) \leq G_{k_0-1}(t,0)} and so {G_{k_0-1,n}(t)} is non-negative and can thus be ignored for the purposes of lower bounds. (There are similar improvements available for higher {n} but this seems to only give negligible improvements and will not be pursued here.) Thus we obtain

\displaystyle  \beta \geq G_{k_0-1}(0,0) (1-\kappa') \ \ \ \ \ (10)

where

\displaystyle  \kappa' := \sum_{2 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n-1}{2} \frac{(k_0-1)^n}{n!}

\displaystyle (\int_{4\delta/(1+4\varpi)}^1 (1-t)^{(k_0-1)/2}\ \frac{dt}{t})^n.

Estimating {\kappa'} similarly to {\kappa} we conclude that

\displaystyle  \kappa' \leq \tilde \kappa'

where

\displaystyle  \tilde \kappa' := \sum_{2 \leq n < \frac{1+4\varpi}{4\delta}} \frac{1}{n!} \frac{3^n-1}{2} (\frac{1+4\varpi}{2\delta} \exp( - \frac{2(k_0-1) \delta}{1+4\varpi} ))^n.

By (9), (10), we see that the condition (1) is implied by

\displaystyle  (1+4\varpi) (1-\kappa') > \frac{4G_{k_0}(0,0)}{k_0 G_{k_0-1}(0,0)} (1+\kappa).

By Theorem 14 and Lemma 15 of this previous post, we may take the ratio {\frac{4G_{k_0}(0,0)}{k_0 G_{k_0-1}(0,0)}} to be arbitrarily close to {\frac{j_{k_0-2}^2}{k_0(k_0-1)}}. We conclude the following theorem.

Theorem 2 Let {0 < \varpi < 1/4} and {0 < \delta < 1/4 + \varpi} be such that {MPZ[\varpi,\delta]} holds. Let {k_0 \geq 2} be an integer, define

\displaystyle  \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n+1}{2} \frac{k_0^n}{n!} (\int_{4\delta/(1+4\varpi)}^1 (1-t)^{k_0/2}\ \frac{dt}{t})^n

and

\displaystyle  \kappa' := \sum_{2 \leq n < \frac{1+4\varpi}{4\delta}} \frac{3^n-1}{2} \frac{(k_0-1)^n}{n!}

\displaystyle (\int_{4\delta/(1+4\varpi)}^1 (1-t)^{(k_0-1)/2}\ \frac{dt}{t})^n

and suppose that

\displaystyle  (1+4\varpi) (1-\kappa') > \frac{j_{k_0-2}^2}{k_0(k_0-1)} (1+\kappa).

Then {DHL[k_0,2]} holds.

As noted earlier, we heuristically have

\displaystyle  \tilde \kappa \approx \frac{1+4\varpi}{\delta} \exp( - \frac{2k_0 \delta}{1+4\varpi} )

and {\tilde \kappa'} is negligible. This constraint is a bit better than the previous condition, in which {\tilde \kappa'} was not present and {\tilde \kappa} was replaced by a quantity roughly of the form {2 \log(2) k_0 \exp( - \frac{2k_0 \delta}{1+4\varpi})}.


Filed under: math.NT, polymath Tagged: polymath8, sieve theory, smooth numbers

Jordan EllenbergTweendom approaches

Spent 20 minutes today arguing with CJ over which is better, the Black Eyed Peas “I Gotta Feeling” or Carly Rae Jepsen’s “Good Time” (feat. Owl City.)  Caleb favors Jepsen, arguing that songs are made of “music, singing, and words,” and that “I Gotta Feeling” wins on words but loses on music and singing.  His judgment of “I Gotta Feeling” as a piece of music is that “the music doesn’t match the singing and 3/4 of the music is copied and the 1/4 of the music that isn’t copied is boring.”

I asked CJ what “Good Time” is about and he said “it’s about people who overestimate their life and think bad things never happen in it.”

 


June 15, 2013

Jordan EllenbergWhat I looked like when I was 17 and talking about math

My mom just sent me this, from the 1989 Westinghouse (now Intel) science fair. I don’t usually think CJ looks very much like me, but in this picture I can kind of see it.

20130611100341179-1


Quantum DiariesYour summer travel options

Now that summer is fully here, are you feeling that old wanderlust, the desire to hit the open road? Well then, there are a lot of interesting places to go on the physics conference circuit between now and Labor Day. There are many fabulous locations on the menu, and who knows, you might get to hear the first public presentation of an exciting new physics result. While it’s true that what many would consider the most glamorous stuff from the LHC has already been pushed out (at the highest priority), you can be assured that scientists are hard at work on new results, and of course there are many other particle-physics experiments that are doing important work. So, find your frequent-flyer card and make sure you’ve changed the oil, and let’s see where you might be headed this summer:

  • 2013 Lepton Photon Conference, San Francisco, CA, June 24-29, hosted by SLAC. This is definitely the most prestigious conference this year; it is the international conference that is the odd-numbered year complement to the ICHEP meetings that are held in even-numbered years. Last year’s ICHEP saw the announcement of the observation of the Higgs boson, and if someone wants to make a big splash this year, they will do it at Lepton Photon. I have previously discussed how ICHEP works; the Lepton Photon series has a similarly storied history, but is slightly different in format, in that there are only plenary overview talks rather than a series of shorter, more focused presentations. San Francisco is always a great destination, and a fine place to consider the physics of the cable car and plate tectonics.
  • 2013 European Physical Society Conference on High Energy Physics, Stockholm, Sweden, July 18-24. If results aren’t ready in time for Lepton Photon, they could be ready in time for EPS. This conference also appears in odd-numbered years, and with a format that has both parallel and plenary sessions, there are many opportunities for younger people to present their work. It is probably the premier particle-physics conference in Europe this year. Thanks to the tilted axis of the earth, and the position of Stockholm at 59 degrees north of the equator, you’ll be able to enjoy 17 hours and 40 minutes of daylight each day at this conference…starting at 4 AM each morning.
  • Community Summer Study 2013, aka Snowmass on the Mississippi, Minneapolis, MN, July 29-August 6. This isn’t really a conference, but it is the culmination of the year-long effort of the US particle-physics community to define its long-range plan. With the discovery of the Higgs boson and important developments neutrino physics, we have better clues on what we should be trying to study in the future. Now we have to understand what facilities are best for this science, and what the technical barriers are to building and exploiting them. But we have to realize that we’re working with a finite budget, and we’ll have to do some hard thinking to understand how to set priorities. You might think that Minneapolis doesn’t have much on San Francisco or Stockholm, but my wife is from there, so I have traveled there many times and I think it’s a great place to visit. You can contemplate the balancing forces and torques on the “Spoonbridge and Cherry” sculpture at the Walker Art Center, or the aerodynamics of Mary Tyler Moore’s hat on the Nicollet Mall.
  • 2013 Meeting of the American Physical Society Division of Particles and Fields, Santa Cruz, CA, August 13-17. Like the EPS conference, DPF also meets in odd-numbered years and is a chance for the US particle physics community to gather. It’s one of my favorite conferences, with a broad program of particle physics and neither too big or too small. It is especially friendly to younger people presenting their own work. Measurements that weren’t ready for the earlier conferences could still get a good audience here. Yes, you might have gone to nearby San Francisco in June, but Santa Cruz has a totally different feel, and you can study the hydrodynamics that power the redwood trees that are all over the campus.

    And you might ask, where am I going this summer? I’d love to get to all of these, but I have another destination this summer — I will be moving my family to Geneva for a sabbatical year at CERN in July. It’s a little disappointing to be missing some of the action in the US, but I’m looking forward to an exciting year. I will be returning to the US for the Snowmass workshop, where I’m co-leading a working group, but that’s about it for conferences for me this summer. That will still be plenty exciting, and I’ll do my best to report all the news about it here.

  • June 14, 2013

    David HoggexoSAMSI day five, wavelets

    Wavelets. Yes, you thought they were cool in the 1980s. Apparently they are back. Almost every statistician at this workshop has mentioned them to me, and after two days of hard work, I am convinced: If you wavelet transform the space in which you do your inference, you can lay down an independent (diagonal) Gaussian Process but still produce exceedingly nontrivial covariance functions. We are looking at whether we can model the Kepler data this way. The idea is to use the wavelets to make it possible to marginalize out all possible stellar variability consistent with the data.

    Baines (Davis) has been instrumental in my understanding, after Carter (CfA) and Dawson (CfA) taught me the basics by battling it out about implementations of the wavelet transform. I am still working on the linear algebra of all this, but I think I am going to find that one issue is that—in order for the method to be fast and scalable—the wavelet part of the problem has to treat the data as uniformly sampled and homoskedastic. Those assumptions are always wrong (do I repeat myself?) but I think I decided today that we can just pretend they are true for the purposes of our rotation into the wavelet basis. The rest of the working group kicked ass while I had these deep, deep thoughts.

    Jordan EllenbergPersiflage on Scholze

    Like everyone else I am wildly cheering Peter Scholze’s new preprint constructing Galois representations attached to torsion classes — torsion classes! — in the cohomology of locally symmetric spaces for GL_n.  I had been aspiring, and still do aspire, to develop enough of a global picture of how this works to write about it on the blog.  But I’m happy to report that it looks like Persiflage, who’s somewhat closer to the subject than I am, is going to do it at his place.  In his words:

    This is mathematics which will, no question, have more impact in number theory than any recent paper I can think of. The basic intent of this post is to commit to future posts in which I will discuss the details.

    At the risk of talking about stuff I dont understand yet, I’ll make one comment.  It seems that a key technical development is Scholze’s ability to use the language of perfectoid spaces to talk about things like modular curves and modular varieties “at infinite level.”  See how I reflexively put scare quotes there?  It’s because, when I learned this stuff, it was customary to pretend to talk about infinite level,  but really this was used as more of a metaphor; every actual argument I knew how to make took place in the pedestrian context of schemes of finite type over local and global fields.  (Others may have been more daring, I don’t know.)  Anyway, Scholze’s techniques seem to allow him to work fearlessly at the top of the tower, no scare quotes necessary, at which point new phenomena appear, phenomena which have implications even back at finite level.

    (I am eager for this preliminary stuff to be corrected, refined, rebuked, and improved on in comments….!)

     

     


    Clifford JohnsonColour and Culture

    colour_scribblesI spotted* this lovely post from a year ago about colour, culture, and language that I thought I’d share. What does the map of colour and colour names look like as you move from culture to culture. And are there universal aspects to it, or is it pretty random? I find this a fascinating topic, and so was delighted to see this post, which addresses a lot of the questions. (You’ll find links there to an episode of Radiolab that was on a similar topic. I recommend that too.) Coincidentally, two pages before the part of my notebook where I’m doing a computation right now is a [...]

    n-Category Café The Selected Papers Network

    Here it is at last: the Selected Papers Network. Given that social networks already exist, all we need for truly open scientific communication is a convention on a consistent set of tags and IDs for discussing papers. Christopher Lee, in bioinformatics at UCLA, has developed software that makes this work. Try it out!

    What’s cool about this system is that it’s federated. Instead of locking up your comments within its own website—the “walled garden” strategy followed by many other service—it explicitly shares these data in a way that people not on the Selected Papers Network can easily see. Any other service can see and use them too.

    To learn about the problems that make us want this system, read this:

    To learn how the system works, and to try it out, go here:

    n-Category Café Torsors and enriched categories

    In another instalment of my occasional series on ‘Things you didn’t realise were enriched categories (unless you’re an expert!)’ I want to talk about torsors and how they can be considered enriched categories where the enriching category is a group, considered as a discrete monoidal category.

    I’ll start off by telling you what a torsor is and then explain how it can be thought of as having something like a ‘group-valued distance’ and how this relates to enriched category theory.

    What is a torsor?

    “Torsor” is a word that used to strike a dread fear into my heart. It is a word that was seemingly only used by high-brow mathematicians who wished to intimidate more lowly souls. That was until I read Dan Freed’s paper Higher Algebraic Structures and Quantization which made me realise that torsor is just a fancy name for a simple concept. For a group G a torsor is something that looks like G but isn’t actually a group. That might sound like an overly cryptic description but hopefully it will make things clear in a minute.

    On my desk I have a coffee cup — we are in a café, after all.

    A coffee cup

    The rim R of my coffee cup is a circle: it looks like the group of unit complex numbers 𝕋, but I don’t know how to multiply two points on the coffee cup rim, R, so the coffee-cup rim R is not canonically a group. However, the group of unit complex number 𝕋 acts on the rim R: if you give me a unit complex number e iθ, I can rotate my coffee cup through the angle θ.

    This action is transitive in that I can move any point on the rim to any other point by a rotation and the action is free in that precisely one rotation will move a specified point to another specified point. A set with a G action which is free and transitive is known as a principal homogeneous G-set, or simply a G-torsor. So the rim of my coffee cup is a 𝕋-torsor.

    If I pick any point p on the rim R then I get an bijection between the points of R and the points in the group 𝕋 by identifying the point p with the identity in 𝕋. So there are as many ways to identify the rim with the circle group 𝕋 as there are points in the rim but none of them is canonical.

    Just as an affine space can be thought of as a vector space without an origin, so a torsor can be thought of as a group without an identity: a torsor is to a group as an affine space is to a vector space.

    John has written a nice piece explaining how torsors, particularly in physics, are more prevalent than you might first think, because our gut reaction is to think in terms of groups. We see a line, so we want to put an origin on it and identify it with rather than accept that it doesn’t naturally have an origin, and should be thought of as an -torsor.

    Another view of torsors

    Torsors can be thought of in a different fashion and thinking in this fashion this will lead us enriched categories. Between each pair of points in a torsor there is something like a ‘group-valued distance’. More precisely, associated to each ordered pair (t 1,t 2) of elements of a G-torsor T there is a unique element of the group G which sends t 1 to t 2, we can write this as g(t 2,t 1), or as g T(t 2,t 1) if we want to emphasise the torsor T. Symbolically, this is the unique element of the group G that satisfies g(t 2,t 1)t 1=t 2. This is the ‘distance’ from t 1 to t 2.

    [I have chosen to write g(t 2,t 1) rather than the more usual category theoretic convention g(t 1,t 2), because the actions are on the left and the chosen conventions make the formulas look a bit nicer.]

    Of course, in the coffee cup example, the ‘distance’ from one point to another is the rotation required to move the first point to the second.

    Because we have a G-action on T we know that (gh)t 1=g(ht 1) and this implies g(t 3,t 2)g(t 2,t 1)=g(t 3,t 1) for all t 1,t 2,t 3.; for the same reason we also have g(t 1,t 1)=e. Moreover, we can recover the G-action from this data because given gt 1 is the unique element of T such that g(gt 1,t 1)=g.

    In summary, we can think of a G-torsor as a set T such that

    1. for each pair t 1,t 2T there is a group element g(t 2,t 1)G;

    2. for each triple t 1,t 2,t 3T there is an equality g(t 3,t 2)g(t 2,t 1)=g(t 3,t 1);

    3. for each element t 1T there is the equality g(t 1,t 1)=e;

    4. for each t 1T, gG there is a unique t 2 such that g(t 2,t 1)=g.

    You might have noticed that the first three of the above conditions form precisely the definition of a certain kind of enriched category, namely a category enriched over the monoidal category 𝒱 G which is the group G considered as a discrete monoidal category. This means that 𝒱 G has the elements of G as its objects, has only identity morphisms and has the group multiplication as the monoidal multiplication: ghgh.

    We have shown that a G-torsor is a 𝒱 G-category satisfying an extra condition. A reasonable question to ask at this point is “What’s the good in that?” One answer is that it gives us another layer of intuition; it allows us to make analogies with categories, with metric spaces, with posets, amongst other things, and it allows us to use the tools from enriched category theory.

    We should now ask what a 𝒱 G-functor ϕ:TS between 𝒱 G-torsors is. This is a function ϕ:TS such that

    g(t 1,t 2)=g(ϕ(t 1),ϕ(t 2)).

    This is quite a rigid condition, saying that the right notion of map is some kind of isometry. Unsurprisingly, if T and S correspond to G-torsors then this is precisely the condition that ϕ is an equivariant map:

    ϕ(gt)=gϕ(t)for allgG,tT.

    New torsors from old

    When we look at torsors over abelian groups there are ways of combining torsors. We will see below how these relate to standard enriched category theory constructions.

    Hom torsor: If A is an abelian group and both T and S are A-torsors then we can form the set Hom A(T,S) of equivariant maps from T to S. I learnt many years ago from Freed’s paper that this set of maps Hom A(T,S) is itself an A-torsor. We can define the action of A on an equivariant map ϕ by (aϕ)(t):=a(ϕ(t)). You can check that aϕ is an equivariant map, but you will see that it is necessary that A is abelian for this to work.

    Tensor torsor: If A is an abelian group and both T and S are A-torsors then we can also define the tensor product T AS as T×S/ where is the relation defined by

    (at,s)(t,as)for allaA,tT,sS.

    We find that T AS is also an A-torsor if we define the action by a(t,s)(at,s). Again, if you check, you will see that you need A to be abelian for this to be well defined.

    Properties of 𝒱 G

    Things get more interesting with enriched categories when we enrich over categories which are closed, braided and bicomplete. Let’s consider each of these conditions for 𝒱 G.

    Closedness: A monoidal category 𝒱 is left-closed if for each object v the functor v:𝒱𝒱 has a right adjoint [v,] left:𝒱𝒱. Unwrapping this definition for 𝒱 G we find that for each gG we need a function of sets [g,] left:GG such that for every hG we have

    gh=xif and only ifh=[g,x] left.

    Clearly, we can take [g,x] left:=g 1x. So for any group G the monoidal category 𝒱 G is left-closed.

    Similarly, a monoidal category 𝒱 is right-closed if for each object v the functor v:𝒱𝒱 has a right adjoint [v,] right:𝒱𝒱. For 𝒱 G, we can take [g,x] right:=xg 1. So for any group G the monoidal category 𝒱 G is both left- and right-closed.

    Braidedness: For a monoidal category to be braided we need isomorphisms vwwv for all objects v and w. For the category 𝒱 G that would mean we need gh=hg for all g,hG, in other words, we need that G is abelian. In that case we have that 𝒱 G is in fact symmetric.

    Bicompleteness: A category is bicomplete if it has all limits and colimits. We have no hope of any non-trivial group G having a bicomplete category 𝒱 G. This is because this category is discrete: to form a product gh we would need projections to g and h but, because the only morphisms are identities, that would mean gh=g and gh=h. You can check that the self products are actually defined though: gg=g. This looks like it might spoil our fun, but it will transpire that the only limits that we need will be the self-products.

    Summary: In summary then, if G is abelian then 𝒱 G is a closed symmetric monoidal category; if G is non-abelian then 𝒱 G is a closed monoidal category which is not braided.

    As the definition of the tensor product of 𝒱-categories requires that 𝒱 is braided, we will restrict ourselves to the case that G is an abelian group, and we will emphasize this fact by renaming it A.

    Functor category and tensor product

    We can now look at how the hom torsor and tensor torsor from above arise in enriched category theory. We will work with an abelian group A so that 𝒱 A is closed symmetric monoidal.

    Functor categories and hom torsors: Suppose 𝒱 is a closed symmetric monoidal category and that 𝒞 and 𝒟 are 𝒱-categories then (provided that 𝒱 has sufficiently many limits) there is 𝒱-category [[𝒞,𝒟]] of 𝒱-morphisms where the objects are the 𝒱-functors and the hom object [[𝒞,𝒟]](ϕ,θ) is given by the equalizer of the following diagram.

    c𝒞𝒟(ϕ(c),θ(c)) c,c𝒞[𝒞(c,c),𝒟(ϕ(c),θ(c))]

    We don’t need to worry here about what the two maps are as, in the case of interest when 𝒱=𝒱 A, they are both going to be equalities.

    If T and S are A-torsors, thought of as 𝒱 A-categories, then we try to construct the functor category [[T,S]]. This has 𝒱 A-functors, i.e. equivariant maps as morphisms. The hom object g(ϕ,θ), between equivariant maps ϕ,θ:TS is given by the equalizer of the above diagram, but as mentioned, the maps are equalities, so the equalizer is just the left-hand-side term, so

    g [[T,S]](ϕ,θ)= tTg S(ϕ(t),θ(t)).

    As mentioned in the previous section, 𝒱 A does not have many limits, but fortunately it does have this product as all the terms in the product are the same: a calculation shows that

    g S(ϕ(t 1),θ(t 1))=g S(ϕ(t 2),θ(t 2))for allt 1,t 2T.

    Hence we find that the “distance” between two functors is the distance between the two images of any point:

    g [[T,S]](ϕ,θ)=g S(ϕ(t),θ(t))for alltT.

    This means that the functor 𝒱 A-category [[T,S]] exists and is actually a torsor. You can easily check that the A-action is exactly that of the hom torsor, so

    [[T,S]]=Hom A(T,S).

    Tensor product categories and tensor torsors: If 𝒱 is a braided monoidal category then you can define the tensor product 𝒞𝒟 of two 𝒱-categories 𝒞 and 𝒟. An object of 𝒞𝒟 is a pair

    (c,d)ob𝒞×ob𝒟

    and the hom objects are defined by

    (𝒞𝒟)((c,d),(c,d))𝒞(c,c)𝒟(d,d).

    The braiding of 𝒱 is needed to define the composition morphisms.

    If T and S are A-torsors, thought of as 𝒱 A-categories, then we can construct the tensor product 𝒱 A-category TS. You might expect that this is actually a torsor equal to the tensor torsor T AS but that’s not quite true!

    In the torsor T AS we have made the identification (acdott,s)=(t,as) however in the 𝒱 A-category TS these two points are distinct despite having `no distance’ between them: you can check that g TS((at,s),(t,as))=e.

    This means that the 𝒱 A-category TS is not actually a torsor but it is equivalent to the torsor T AS: the quotient map on objects gives an equivalence of 𝒱 A-categories TST AS. You can think of TS as a fattened-up version of T AS; it is like using a stack instead of a quotient space. This is what you might expect from a categorical approach, in that you don’t violently quotient out but rather encode the equivalence relation via isomorphisms.

    Final words

    I’ll just finish by mentioning the Yoneda map. For any 𝒱 A-category T, the presheaf 𝒱 A-category T^[[T op,𝒱 A]] is a torsor.

    If T is actually a torsor then the Yoneda map TT^ is an isomorphism, which is not something you would expect from the Yoneda map!

    If T is not a torsor then the Yoneda map TT^ is a torsorization of T. For instance, in the tensor example above we have T ASTS^.

    Sean CarrollThere Is No Classical World

    Caltech’s Institute for Quantum Information and Matter is a fun place. It’s led by people like John Preskill, Jeff Kimble, and Alexei Kitaev — some of the world’s great scientists — so you know the physics is going to be top-notch. But it’s the youngsters, such as postdoc Spiros Michalakis, who are bringing the fun. Suff like the IQIM blog (where you should read John’s recent post on the Maldadcena/Susskind wormhole proposal) and a successful Kickstarter campaign for science-inspired fashion.

    The fun is now being ratcheted up even higher, as IQIM is teaming with Jorge Cham of PhD Comics fame to make a series of animated web videos about quantum mechanics. I ask you, who doesn’t love some good videos about quantum mechanics??

    Sensibly, they’ve kicked off by spotlighting an interesting experimental result, rather than diving right into the realms of esoteric theoretical speculation. Of course, this is quantum mechanics we’re talking about, so even the experiments get pretty wild in their implications. The work is by Amir Safavi-Naeini and Oskar Painter, who take a small mirror and put it into a quantum state where its center of mass is as cold as it is possible to be. Classically, of course, the mirror can be perfectly still; quantum-mechanically, there is a ground state wave function that still shows “fluctuations” (i.e. the fact that observations won’t always show zero motion).

    Now, the mirror is tiny — microscopic, it’s fair to say — but it’s not that tiny. It’s a piece of metal, non just an atom or two. (I didn’t catch what the actual size was.) So the implication here is that things don’t miraculously “become classical” when they are made of many atoms rather than just a few. We don’t notice the quantum-ness of the universe in our everyday lives, but that’s because the systems we encounter are noisy and constantly jostled by their environments, leading to rapid decoherence; not because there is a magical transition to classicalness once you get above a certain number of atoms, or a truly distinct “classical realm.”

    Of course, no right-minded person really believes that there is a hard and fast transition to a classical realm once objects get big; rather, there is a sense in which the classical approximation becomes more and more accurate, but it’s always just an approximation. The experimental results here are simply affirming the truth of quantum mechanics. Nevertheless, you can still meet people (the wrong-minded ones) who are willing to believe that electrons and photons are governed by quantum mechanics, but not that they are governed by quantum mechanics. Have them watch this video, and hope that the implications sink in.

    Share

    Doug NatelsonCome on, PRL editors.

    I rarely criticize papers.  I write this not to single out the authors (none of whom I know), nor to criticize the actual science (which seems very interesting) but to ask pointedly:  How did the editors of PRL, a journal that allegedly prizes readability by a general physics audience, allow this to go through in its current form?  This paper is titled "Poor Man’s Understanding of Kinks Originating from Strong Electronic Correlations".  A natural question would be, "Kinks in what?".  Unfortunately, the abstract doesn't say.  Worse, it refers to "the central peak".  Again, a peak in what?!   Something as a function of something, that's for sure. 

    Come on, editors - if you are going to let articles be knocked from PRL contention because they're "more suitable for a specialized journal", that obligates you to make sure that the papers you do print at least have titles and abstracts that are accessible.  I'm even a specialist in the field and I wasn't sure what the authors were talking about (some spectral density function?) based on the title and abstract.

    The authors actually do a good job explaining the issue in the very first sentence of the paper:  "Kinks in the energy vs. momentum dispersion relation indicate deviations from a quasiparticle renormalization of the noninteracting system."   That should have been the first sentence in the abstract.  In a noninteracting system, the relationship between energy and momentum of particles is smooth.  For example, for a free electron, \( E = p^{2}/2m \) where \(m\) is the mass.  In an ordinary metal (where Fermi liquid theory works), you can write a similar smooth relationship for the energy vs. momentum relationship of the quasiparticles. Kinks in that relationship, as the authors say, "provide valuable information of many-body effects".  

    BackreactionBasic research is vital

    Last month I had the flu. I was down with a fever of more than 40°C, four days in a row. Needless to say, it was a public holiday.

    While the body is struggling to recover from illness, priorities shift. Survive first. Drink. Eat. Stand upright without fainting. Feed the kids because they can’t do it themselves. Two days earlier, I was thinking of running a half-marathon, now happy to make it to the bathroom. Forgotten the parking ticket and the tax return.

    We see the same shift of priorities on other levels of our societies. If a system, may that be an organism or a group of people, experiences a potential threat to existence, energy is redirected to the essential needs, to survival first. An unexpected death in the family requires time for recovery and reorganization. A nation that is being attacked redirects resources to the military.

    The human body’s defense against viruses does not require conscious control. It executes a program that millions of years of evolution have optimized, a program we can support with medication and nutrition. But when it comes to priorities of our nations, we have no program to follow. We first have to decide what is necessary for survival, and what can be put on hold while we recover.

    The last years have not been good years economically, neither in the European Union, nor in North America. We all feel the pressure. We’re forced to focus our priorities. And every week I read a new article about cuts in some research budget.

    “Europe's leaders slash proposed research budget,” I read. “Big cuts to R&D budgets [in the UK],” I read. “More than 50 Nobel laureates are urging [the US] Congress to spare the federal science establishment from the looming budget cut,” I read.

    An organism befallen by illness manages a shortage of energy. A nation under economic pressure manages a shortage of money. But money is only the tool for the management. And it is a complicated tool, its value influenced by many factors including psychological, and it is not just under national management. In the end, its purpose is to direct labor. And here is the real energy of our nations: Humans, working. It is the amounts of working hours in different professions that budget cuts manage.

    In reaction to a perceived threat, nations shift priorities and redirect human labor. They might aim at sustainability. At independence from oil imports. They invest in public health. Or they cut back on these investments. When the pressure raises, what is left will be the essentials. Energy and food, housing and safety. Decisions have to be made. The people who assemble weapons are not available to water the fields.

    How vital is science?

    We all know that progress depends on scientific research. Somebody has to develop new technologies. Somebody has to test whether they are safe to use. Everybody understands what applied science does: In goes brain, out comes what you’ll smear into your face or wear on your nose tomorrow.

    But not everybody understands that this isn’t all of science. Besides the output-oriented research, there is the research that is not conducted with the aim of developing new technologies. It is curiosity-driven. It follows the loose ends of today's theories, it aims to understand the puzzle that is the data. Most scientists call it basic or fundamental research. The NSF calls it transformative research, the ERC frontier research. Sometimes I’ve heard the expression blue-skies research. Whatever the name, its defining property is that you don’t know the result before you’ve done the research.

    Since many people do not understand what fundamental research is or why it is necessary, if science funding is cut, basic research suffers most. Politicians lack the proper words to justify investment into something that doesn’t seem to have any tangible outcome. Something that, it seems, just pleases the curiosity of academics. “The question is academic,” has to come to mean “The world doesn’t care about its answer.”

    A truly shocking recent example comes from Canada:
    “Scientific discovery is not valuable unless it has commercial value," John McDougall, president of the [Canadian National Research Council], said in announcing the shift in the NRC's research focus away from discovery science solely to research the government deems "commercially viable". [Source: Toronto Sun] [Update: He didn't literally say this as the Sun quoted it, see here for the correct quote.]
    Oh, Canada. (Also: Could somebody boot the guy, he’s in the wrong profession.)

    Do they not understand how vital basic research is for their nation? Or do they decide not to raise the point? I suspect that at least some of those involved in such a decision approve cutting back on basic research not because they don’t understand what it’s good for, but because they believe their people don’t understand what it’s good for. (And they would be wrong, if you scroll down and look at the poll results...)

    I suspect that scientists are an easy target, they usually don’t offer much resistance. They're not organized, for not to say disorganized. Scientists will try to cope until it becomes impossible and then pack their bags and their families and move to elsewhere. And once they’re gone, Canada, you’ll have to invest much more money than you save now to get them back.

    Do they really not know that basic research, in one sentence, is the applied research in 100 years?

    It isn’t possible, in basic research, to formulate a commercial application as goal because nobody can make predictions or formulate research plans over 100 years. There are too many unknown unknowns, the system is too complex, there are too many independent knowledge seekers in the game. Nobody can tell reliably what is going to happen.

    They say “commercially viable”, but what they actually mean is “commercially viable within 5 years”.

    The scientific theories that modern technology and medicine are based on – from LCD displays over DVD-players to spectroscopy and magnetic resonance imaging, from laser surgery to quantum computers – none of them would exist had scientists pursued “commercial viability”. Without curiosity-driven research, we deliberately ignore paths to new areas of knowledge. Applied research will inevitably run dry sooner or later. Scientific progress is not sustainable without basic research.

    As your mother told you, if you have a fever, watch your fluid intake. Even if you are tired and don’t feel like moving a finger, drink that glass of water. The woman with the flu who didn’t drink enough today is the woman in the hospital on an IV-drip tomorrow. And the nation under economic pressure who didn’t invest in basic research today is the nation that will wish there was a global IV-drop for their artery tomorrow.

    And here’s some other people saying the same thing in less words [via Steve Hsu]:



    I know that on this blog a post like this preaches to the choir. So today I have homework for you. Tell your friends and your neighbors and the other parents at the daycare place. Tell them what basic research is and why it’s vital. And if you don’t feel like talking, send them a link or show them a video.

    June 13, 2013