Planet Musings

February 14, 2012

Terence Tao254B, Notes 6: Non-concentration in subgroups

In the last three notes, we discussed the Bourgain-Gamburd expansion machine and two of its three ingredients, namely quasirandomness and product theorems, leaving only the non-concentration ingredient to discuss. We can summarise the results of the last three notes, in the case of fields of prime order, as the following theorem.

Theorem 1 (Non-concentration implies expansion in {SL_d}) Let {p} be a prime, let {d \geq 1}, and let {S} be a symmetric set of elements in {G := SL_d(F_p)} of cardinality {|S|=k} not containing the identity. Write {\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}, and suppose that one has the non-concentration property

\displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa} \ \ \ \ \ (1)

for some {\kappa>0} and some even integer {n \leq \Lambda \log |G|}. Then {Cay(G,S)} is a one-sided {\epsilon}-expander for some {\epsilon>0} depending only on {k, d, \kappa,\Lambda}.

Proof: From (1) we see that {\mu^{(n)}} is not supported in any proper subgroup {H} of {G}, which implies that {S} generates {G}. The claim now follows from the Bourgain-Gamburd expansion machine (Theorem 2 of Notes 4), the product theorem (Theorem 1 of Notes 5), and quasirandomness (Exercise 8 of Notes 3). \Box

Remark 1 The same argument also works if we replace {F_p} by the field {F_{p^j}} of order {p^j} for some bounded {j}. However, there is a difficulty in the regime when {j} is unbounded, because the quasirandomness property becomes too weak for the Bourgain-Gamburd expansion machine to be directly applicable. On theother hand, the above type of theorem was generalised to the setting of cyclic groups {{\bf Z}/q{\bf Z}} with {q} square-free by Varju, to arbitrary {q} by Bourgain and Varju, and to more general algebraic groups than {SL_d} and square-free {q} by Salehi Golsefidy and Varju. It may be that some modification of the proof techniques in these papers may also be able to handle the field case {F_{p^j}} with unbounded {j}. Finally, we remark that we can also obtain two-sided expansion by the same methods if one works with the lazy random walk, generated by {\frac{1}{2} + \frac{1}{2} \mu} instead of {\mu}.

It thus remains to construct tools that can establish the non-concentration property (1). The situation is particularly simple in {SL_2(F_p)}, as we have a good understanding of the subgroups of that group. Indeed, from Theorem 14 from Notes 5, we obtain the following corollary to Theorem 1:

Corollary 2 (Non-concentration implies expansion in {SL_2}) Let {p} be a prime, and let {S} be a symmetric set of elements in {G := SL_2(F_p)} of cardinality {|S|=k} not containing the identity. Write {\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}, and suppose that one has the non-concentration property

\displaystyle  \sup_{B}\mu^{(n)}(B) < |G|^{-\kappa} \ \ \ \ \ (2)

for some {\kappa>0} and some even integer {n \leq \Lambda \log |G|}, where {B} ranges over all Borel subgroups of {SL_2(\overline{F})}. Then, if {|G|} is sufficiently large depending on {k,\kappa,\Lambda}, {Cay(G,S)} is a one-sided {\epsilon}-expander for some {\epsilon>0} depending only on {k, \kappa,\Lambda}.

It turns out (2) can be verified in many cases by exploiting the solvable nature of the Borel subgroups {B}. We give two examples of this in these notes. The first result, due to Bourgain and Gamburd (with earlier partial results by Gamburd and by Shalom) generalises Selberg’s expander construction to the case when {S} generates a thin subgroup of {SL_2({\bf Z})}:

Theorem 3 (Expansion in thin subgroups) Let {S} be a symmetric subset of {SL_2({\bf Z})} not containing the identity, and suppose that the group {\langle S \rangle} generated by {S} is not virtually solvable. Then as {p} ranges over all sufficiently large primes, the Cayley graphs {Cay(SL_2(F_p), \pi_p(S))} form a one-sided expander family, where {\pi_p: SL_2({\bf Z}) \rightarrow SL_2(F_p)} is the usual projection.

Remark 2 One corollary of Theorem 3 is that {\pi_p(S)} generates {SL_2(F_p)} for all sufficiently large {p}, if {\langle S \rangle} is not virtually solvable. This is a special case of a much more general result, known as the strong approximation theorem.

Exercise 1 In the converse direction, if {\langle S\rangle} is virtually solvable, show that for sufficiently large {p}, {\pi_p(S)} fails to generate {SL_2(F_p)}. (Hint: use Theorem 14 from Notes 5 to prevent {SL_2(F_p)} from having bounded index solvable subgroups.)

Exercise 2 (Lubotzsky’s 1-2-3 problem) Let {S := \{ \begin{pmatrix}1 & \pm 3 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix}1 & 0 \\ \pm 3 & 1 \end{pmatrix}}.

  • (i) Show that {S} generates a free subgroup of {SL_2({\bf Z})}. (Hint: use a ping-pong argument, as in Exercise 23 of Notes 2.)
  • (ii) Show that if {v, w} are two distinct elements of the sector {\{ (x,y) \in {\bf R}^2_+: x/2 < y < 2x \}}, then there os no element {g \in \langle S \rangle} for which {gv = w}. (Hint: this is another ping-pong argument.) Conclude that {\langle S \rangle} has infinite index in {SL_2({\bf Z})}. (Contrast this with the situation in which the {3} coefficients in {S} are replaced by {1} or {2}, in which case {\langle S \rangle} is either all of {SL_2({\bf Z})}, or a finite index subgroup, as demonstrated in Exercise 23 of Notes 2).
  • (iii) Show that {Cay(SL_2(F_p), \pi_p(S))} for sufficiently large primes {p} form a one-sided expander family.

Remark 3 Theorem 3 has been generalised to arbitrary linear groups, and with {F_p} replaced by {{\bf Z}/q{\bf Z}} for square-free {q}; see this paper of Salehi Golsefidy and Varju. In this more general setting, the condition of virtual solvability must be replaced by the condition that the connected component of the Zariski closure of {\langle S \rangle} is perfect. An effective version of Theorem 3 (with completely explicit constants) was recently obtained by Kowalski.

The second example concerns Cayley graphs constructed using random elements of {SL_2(F_p)}.

Theorem 4 (Random generators expand) Let {p} be a prime, and let {x,y} be two elements of {SL_2(F_p)} chosen uniformly at random. Then with probability {1-o_{p \rightarrow \infty}(1)}, {Cay(SL_2(F_p), \{x,x^{-1},y,y^{-1}\})} is a one-sided {\epsilon}-expander for some absolute constant {\epsilon}.

Remark 4 As with Theorem 3, Theorem 4 has also been extended to a number of other groups, such as the Suzuki groups (in this paper of Breuillard, Green, and Tao), and more generally to finite simple groups of Lie type of bounded rank (in forthcoming work of Breuillard, Green, Guralnick, and Tao). There are a number of other constructions of expanding Cayley graphs in such groups (and in other interesting groups, such as the alternating groups) beyond those discussed in these notes; see this recent survey of Lubotzky for further discussion. It has been conjectured by Lubotzky and Weiss that any pair {x,y} of (say) {SL_2(F_p)} that generates the group, is a one-sided {\epsilon}-expander for an absolute constant {\epsilon}: in the case of {SL_2(F_p)}, this has been established for a density one set of primes by Breuillard and Gamburd.

— 1. Expansion in thin subgroups —

We now prove Theorem 3. The first observation is that the expansion property is monotone in the group {\langle S \rangle}:

Exercise 3 Let {S, S'} be symmetric subsets of {SL_2({\bf Z})} not containing the identity, such that {\langle S \rangle \subset \langle S' \rangle}. Suppose that {Cay(SL_2(F_p), \pi_p(S))} is a one-sided expander family for sufficiently large primes {p}. Show that {Cay(SL_2(F_p), \pi_p(S'))} is also a one-sided expander family.

As a consequence, Theorem 3 follows from the following two statments:

Theorem 5 (Tits alternative) Let {\Gamma \subset SL_2({\bf Z})} be a group. Then exactly one of the following statements holds:

  • (i) {\Gamma} is virtually solvable.
  • (ii) {\Gamma} contains a copy of the free group {F_2} of two generators as a subgroup.

Theorem 6 (Expansion in free groups) Let {x,y \in SL_2({\bf Z})} be generators of a free subgroup of {SL_2({\bf Z})}. Then as {p} ranges over all sufficiently large primes, the Cayley graphs {Cay(SL_2(F_p), \pi_p(\{x,y,x^{-1},y^{-1}\}))} form a one-sided expander family.

Theorem 5 is a special case of the famous Tits alternative, which among other things allows one to replace {SL_2({\bf Z})} by {GL_d(k)} for any {d \geq 1} and any field {k} of characteristic zero (and fields of positive characteristic are also allowed, if one adds the requirement that {\Gamma} be finitely generated). We will not prove the full Tits alternative here, but instead just give an ad hoc proof of the special case in Theorem 5 in the following exercise.

Exercise 4 Given any matrix {g \in SL_2({\bf Z})}, the singular values are {\|g\|_{op}} and {\|g\|_{op}^{-1}}, and we can apply the singular value decomposition to decompose

\displaystyle  g = u_1(g) \|g\|_{op} v_1^*(g) + u_2(g) \|g\|_{op}^{-1} v_2(g)^*

where {u_1(g),u_2(g)\in {\bf C}^2} and {v_1(g), v_2(g) \in {\bf C}^2} are orthonormal bases. (When {\|g\|_{op}>1}, these bases are uniquely determined up to phase rotation.) We let {\tilde u_1(g) \in {\bf CP}^1} be the projection of {u_1(g)} to the projective complex plane, and similarly define {\tilde v_2(g)}.

Let {\Gamma} be a subgroup of {SL_2({\bf Z})}. Call a pair {(u,v) \in {\bf CP}^1 \times {\bf CP}^1} a limit point of {\Gamma} if there exists a sequence {g_n \in \Gamma} with {\|g_n\|_{op} \rightarrow \infty} and {(\tilde u_1(g_n), \tilde v_2(g_n)) \rightarrow (u,v)}.

  • (i) Show that if {\Gamma} is infinite, then there is at least one limit point.
  • (ii) Show that if {(u,v)} is a limit point, then so is {(v,u)}.
  • (iii) Show that if there are two limit points {(u,v), (u',v')} with {\{u,v\} \cap \{u',v'\} = \emptyset}, then there exist {g,h \in \Gamma} that generate a free group. (Hint: Choose {(\tilde u_1(g), \tilde v_2(g))} close to {(u,v)} and {(\tilde u_1(h),\tilde v_2(h))} close to {(u',v')}, and consider the action of {g} and {h} on {{\bf CP}^1}, and specifically on small neighbourhoods of {u,v,u',v'}, and set up a ping-pong type situation.)
  • (iv) Show that if {g \in SL_2({\bf Z})} is hyperbolic (i.e. it has an eigenvalue greater than 1), with eigenvectors {u,v}, then the projectivisations {(\tilde u,\tilde v)} of {u,v} form a limit point. Similarly, if {g} is regular parabolic (i.e. it has an eigenvalue at 1, but is not the identity) with eigenvector {u}, show that {(\tilde u,\tilde bu)} is a limit point.
  • (v) Show that if {\Gamma} has no free subgroup of two generators, then all hyperbolic and regular parabolic elements of {\Gamma} have a common eigenvector. Conclude that all such elements lie in a solvable subgroup of {\Gamma}.
  • (vi) Show that if an element {g \in SL_2({\bf Z})} is neither hyperbolic nor regular parabolic, and is not a multiple of the identity, then {g} is conjugate to a rotation by {\pi/2} (in particular, {g^2=-1}).
  • (vii) Establish Theorem 5. (Hint: show that two square roots of {-1} in {SL_2({\bf Z})} cannot multiply to another square root of {-1}.)

Now we prove Theorem 6. Let {\Gamma} be a free subgroup of {SL_2({\bf Z})} generated by two generators {x,y}. Let {\mu := \frac{1}{4} (\delta_x +\delta_{x^{-1}} + \delta_y + \delta_{y^{-1}})} be the probability measure generating a random walk on {SL_2({\bf Z})}, thus {(\pi_p)_* \mu} is the corresponding generator on {SL_2(F_p)}. By Corollary 2, it thus suffices to show that

\displaystyle  \sup_{B}((\pi_p)_* \mu)^{(n)}(B) < p^{-\kappa}

for all sufficiently large {p}, some absolute constant {\kappa>0}, and some even {n = O(\log p)} (depending on {p}, of course), where {B} ranges over Borel subgroups.

As {\pi_p} is a homomorphism, one has {((\pi_p)_* \mu)^{(n)}(B) = (\pi_p)_* (\mu^{(n)})(B) = \mu^{(n)}(\pi_p^{-1}(B))} and so it suffices to show that

\displaystyle  \sup_{B} \mu^{(n)}(\pi_p^{-1}(B)) < p^{-\kappa}.

To deal with the supremum here, we will use an argument of Bourgain and Gamburd, taking advantage of the fact that all Borel groups of {SL_2} obey a common group law, the point being that free groups such as {\Gamma} obey such laws only very rarely. More precisely, we use the fact that the Borel groups are solvable of derived length two; in particular we have

\displaystyle  [[a,b],[c,d]] = 1 \ \ \ \ \ (3)

for all {a,b,c,d \in B}. Now, {\mu^{(n)}} is supported on matrices in {SL_2({\bf Z})} whose coefficients have size {O(\exp(O(n)))} (where we allow the implied constants to depend on the choice of generators {x,y}), and so {(\pi_p)_*( \mu^{(n)} )} is supported on matrices in {SL_2(F_p)} whose coefficients also have size {O(\exp(O(n)))}. If {n} is less than a sufficiently small multiple of {\log p}, these coefficients are then less than {p^{1/10}} (say). As such, if {\tilde a,\tilde b,\tilde c,\tilde d \in SL_2({\bf Z})} lie in the support of {\mu^{(n)}} and their projections {a = \pi_p(\tilde a), \ldots, d = \pi_p(\tilde d)} obey the word law (3) in {SL_2(F_p)}, then the original matrices {\tilde a, \tilde b, \tilde c, \tilde d} obey the word law (3) in {SL_2({\bf Z})}. (This lifting of identities from the characteristic {p} setting of {SL_2(F_p)} to the characteristic {0} setting of {SL_2({\bf Z})} is a simple example of the “Lefschetz principle”.)

To summarise, if we let {E_{n,p,B}} be the set of all elements of {\pi_p^{-1}(B)} that lie in the support of {\mu^{(n)}}, then (3) holds for all {a,b,c,d \in E_{n,p,B}}. This severely limits the size of {E_{n,p,B}} to only be of polynomial size, rather than exponential size:

Proposition 7 Let {E} be a subset of the support of {\mu^{(n)}} (thus, {E} consists of words in {x,y,x^{-1},y^{-1}} of length {n}) such that the law (3) holds for all {a,b,c,d \in E}. Then {|E| \ll n^2}.

The proof of this proposition is laid out in the exercise below.

Exercise 5 Let {\Gamma} be a free group generated by two generators {x,y}. Let {B} be the set of all words of length at most {n} in {x,y,x^{-1},y^{-1}}.

  • (i) Show that if {a,b \in \Gamma} commute, then {a, b} lie in the same cyclic group, thus {a = c^i, b = c^j} for some {c \in \Gamma} and {i,j \in {\bf Z}}.
  • (ii) Show that if {a \in \Gamma}, there are at most {O(n)} elements of {B} that commute with {a}.
  • (iii) Show that if {a,c \in \Gamma}, there are at most {O(n)} elements {b} of {B} with {[a,b] = c}.
  • (iv) Prove Proposition 7.

Now we can conclude the proof of Theorem 3:

Exercise 6 Let {\Gamma} be a free group generated by two generators {x,y}.

Exercise 7 Strengthen the one-sided expansion in Theorem 3 to two-sided expansion.

— 2. Random generators expand —

We now prove Theorem 4. Let {{\bf F}_2} be the free group on two formal generators {a,b}, and let {\mu := \frac{1}{4}(\delta_a + \delta_b + \delta_{a^{-1}}+ \delta_{b^{-1}}} be the generator of the random walk. For any word {w \in {\bf F}_2} and any {x,y} in a group {G}, let {w(x,y) \in G} be the element of {G} formed by substituting {x,y} for {a,b} respectively in the word {w}; thus {w} can be viewed as a map {w: G \times G \rightarrow G} for any group {G}. Observe that if {w} is drawn randomly using the distribution {\mu^{(n)}}, and {x,y \in SL_2(F_p)}, then {w(x,y)} is distributed according to the law {\tilde \mu^{(n)}}, where {\tilde \mu := \frac{1}{4}(\delta_x + \delta_y + \delta_{x^{-1}}+ \delta_{y^{-1}})}. Applying Corollary 2, it suffices to show that whenever {p} is a large prime and {x,y} are chosen uniformly and independently at random from {SL_2(F_p)}, that with probability {1-o_{p \rightarrow \infty}(1)}, one has

\displaystyle  \sup_B {\bf P}_w ( w(x,y) \in B ) \leq p^{-\kappa} \ \ \ \ \ (4)

for some absolute constant {\kappa}, where {B} ranges over all Borel subgroups of {SL_2(\overline{F_p})} and {w} is drawn from the law {\mu^{(n)}} for some even natural number {n = O(\log p)}.

Let {B_n} denote the words in {{\bf F}_2} of length at most {n}. We may use the law (3) to obtain good bound on the supremum in (4) assuming a certain non-degeneracy property of the word evaluations {w(x,y)}:

Exercise 8 Let {n} be a natural number, and suppose that {x,y \in SL_2(F_p)} is such that {w(x,y) \neq 1} for {w \in B_{100n} \backslash \{1\}}. Show that

\displaystyle  \sup_B {\bf P}_w ( w(x,y) \in B ) \ll \exp(-cn)

for some absolute constant {c>0}, where {w} is drawn from the law {\mu^{(n)}}. (Hint: use (3) and the hypothesis to lift the problem up to {{\bf F}_2}, at which point one can use Proposition 7 and Exercise 6.)

In view of this exercise, it suffices to show that with probability {1-o_{p \rightarrow\infty}(1)}, one has {w(x,y) \neq 1} for all {w \in B_{100n} \backslash \{1\}} for some {n} comparable to a small multiple of {\log p}. As {B_{100n}} has {\exp(O(n))} elements, it thus suffices by the union bound to show that

\displaystyle  {\bf P}_{x,y}(w(x,y)=1) \leq p^{-\gamma} \ \ \ \ \ (5)

for some absolute constant {\gamma > 0}, and any {w \in {\bf F}_2 \backslash \{1\}} of length less than {c\log p} for some sufficiently small absolute constant {c>0}.

Let us now fix a non-identity word {w} of length {|w|} less than {c\log p}, and consider {w} as a function from {SL_2(k) \times SL_2(k)} to {SL_2(k)} for an arbitrary field {k}. We can identify {SL_2(k)} with the set {\{ (a,b,c,d)\in k^4: ad-bc=1\}}. A routine induction then shows that the expression {w((a,b,c,d),(a',b',c',d'))} is then a polynomial in the eight variables {a,b,c,d,a',b',c',d'} of degree {O(|w|)} and coefficients which are integers of size {O( \exp( O(|w|) ) )}. Let us then make the additional restriction to the case {a,a' \neq 0}, in which case we can write {d = \frac{bc+1}{a}} and {d' =\frac{b'c'+1}{a'}}. Then {w((a,b,c,d),(a',b',c',d'))} is now a rational function of {a,b,c,a',b',c'} whose numerator is a polynomial of degree {O(|w|)} and coefficients of size {O( \exp( O(|w|) ) )}, and the denominator is a monomial of {a,a'} of degree {O(|w|)}.

We then specialise this rational function to the field {k=F_p}. It is conceivable that when one does so, the rational function collapses to the constant polynomial {(1,0,0,1)}, thus {w((a,b,c,d),(a',b',c',d'))=1} for all {(a,b,c,d),(a',b',c',d') \in SL_2(F_p)} with {a,a' \neq 0}. (For instance, this would be the case if {w(x,y) = x^{|SL_2(F_p)|}}, by Lagrange’s theorem, if it were not for the fact that {|w|} is far too large here.) But suppose that this rational function does not collapse to the constant rational function. Applying the Schwarz-Zippel lemma (Exercise 23 from Notes 5), we then see that the set of pairs {(a,b,c,d),(a',b',c',d') \in SL_2(F_p)} with {a,a' \neq 0} and {w((a,b,c,d),(a',b',c',d'))=1} is at most {O( |w| p^5 )}; adding in the {a=0} and {a'=0} cases, one still obtains a bound of {O(|w|p^5)}, which is acceptable since {|SL_2(F_p)|^2 \sim p^6} and {|w| = O( \log p )}. Thus, the only remaining case to consider is when the rational function {w((a,b,c,d),(a',b',c',d'))} is identically {1} on {SL_2(F_p)} with {a,a' \neq 0}.

Now we perform another “Lefschetz principle” maneuvre to change the underlying field. Recall that the denominator of rational function {w((a,b,c,d),(a',b',c',d'))} is monomial in {a,a'}, and the numerator has coefficients of size {O(\exp(O(|w|)))}. If {|w|} is less than {c\log p} for a sufficiently small {p}, we conclude in particular (for {p} large enough) that the coefficients all have magnitude less than {p}. As such, the only way that this function can be identically {1} on {SL_2(F_p)} is if it is identically {1} on {SL_2(k)} for all {k} with {a,a' \neq 0}, and hence for {a=0} or {a'=0} also by taking Zariski closures.

On the other hand, we know that for some choices of {k}, e.g. {k={\bf R}}, {SL_2(k)} contains a copy of the free group on two generators (see e.g. Exercise 23 of Notes 2). As such, it is not possible for any non-identity word {w} to be identically trivial on {SL_2(k) \times SL_2(k)}. Thus this case cannot actually occur, completing the proof of (5) and hence of Theorem 4.


Filed under: 254B - expansion in groups, math.GR, math.PR Tagged: non-concentration, random walks

February 13, 2012

Dave BaconRandomized Governance

What if instead of electing our representatives in government, we simply chose them at random?

A new Rasmussen poll asked 1,000 likely voters exactly this question. Turns out, 43% thought that a random choice of people from the phonebook would do a better job than the current legislators, a plurality. Of course, these people were themselves chosen randomly from a phonebook, so I’m not sure they are entirely unbiased. :)

But why stop at the legislators? Why not just write random legislation using context-free grammars? We already have software that can automatically write scientific papers, so it doesn’t seem like a stretch. I guess that a lot of this random legislation would be better than SOPA.

ResonaancesHow to find a stop

Lately there's been a surge of interest in hypothetical scalar partners of the top quark, the stops in short. So it may be a good moment to sell a few technical details to a larger audience. For theorists, a stop is easy to distinguish experimentally: it looks like a top but with a twiddle on top. However experimentalists are not as smart, and they have to invest much more time and effort in order to identify stops at the LHC.

What it looks like depends first of all on how it decays. Even the minimal SUSY model offers countless possibilities. Leaving out the case of stable stops, in the MSSM stops ultimately decays to a number of known particles from the Standard Model plus the lightest supersymmetric partner (LSP) who is assumed to be a very weakly interacting particle showing up as missing momentum in a detector. Some possible decay chains are:
Stop → top + LSP, Stop → W + sbottom → W + b + LSP, Stop → bottom + chargino → bottom + W + LSP, etc.
The bottom line is that the MSSM stop should manifest itself at the LHC as an excess of events with:
  • top and/or bottom quarks,
  • significant missing energy due to the LSP.
Now, how to produce it. Being top partners, stops carry a color charge, hence they can be produced in gluon collisions which are easy to come by at the LHC. However, on the plot you see that production of stop pairs is far less frequent than that of gluinos and 1st generation squarks of similar mass. In physics jargon, s-channel production of scalar particles is velocity suppressed as a consequence of angular momentum conservation. This is the main reason why, as you'll see below, the LHC limits on stop pair production are so much weaker than those on gluinos. However, there is a trick to boost the stop production rate by producing them indirectly in gluino decays: gluino → stop + top, as long as the gluino is not much heavier than the stops. As a bonus, this production mode generates more junk in the detector that could be helpful in discriminating signal from background. For example, one can imagine the sequence:
pp → 2 gluinos → 2 stops + 2 tops → 4 tops + 2 LSPs
which leaves us with 4 top quarks in the final state. The 4-top production rate in the Standard Model is very small, therefore observation of such a final state at this point would be a clear sign of new physics. Another place where this sort of cascades could show up are the searches for same-sign top quarks.

What is the experimental situation so far? As far as I know, the LHC collaborations have not yet published any limits on direct stop production. On the other hand, gluino mediated stop production was targeted in this note based on 1 fb-1 of ATLAS data. The plot shows that gluinos decaying in the sequence:
gluino → top + stop → 2 tops + bottom + chargino→ 2 tops + bottom + W + LSP
cannot be lighter than 500 GeV. During the next 30 days leading to the Moriond conference many more searches based on larger data samples will be released, starting with the ATLAS talk tomorrow.

For today, we can get some idea of the current LHC sensitivity from this paper, which compiles a large number of SUSY searches and recasts the results in terms of limits on stops. The LHC reach for direct stop production (right plots) is poor, corresponding to stop mass of only 200-300 GeV. For gluino mediated stop production (plots below) the limits are much better and extend to approximately 700 GeV gluino masses (though the precise limits may depend on details of the SUSY spectrum; it is probably possible to design spectra for which these limits are somewhat weaker). Amusingly, the dedicated ATLAS search does not seem to be the most sensitive probe of gluino mediated stop production. Instead, more stringent limits come from vanilla SUSY searches (decaying top quarks produce jets, b-jets, and/or leptons that can be picked up by these searches). We'll see very soon whether the coming experimental analyses will significantly improve these limits.

Chad OrzelHow to Teach Relativity to Your Dog Photoshop Contest Results

So, the big How to Teach Physics to Your Dog Photoshop contest concluded on Friday. We got five really good entries, and the judges (me and Kate) had a hard time reaching a decision. After long deliberation, though, we've come up with a solution.

But first, the entries:

Read the rest of this post... | Read the comments on this post...

Secret Blogging SeminarA way to discover the Gamma function

I was messing around this morning and I discovered the following, which seemed cute enough to share. In this post, I’ll make what strikes me as a very reasonable attempt to define u! for u not an integer. Will I get the \Gamma function? Wait and see!

We have e^z = \sum z^n/n!. So, by basic complex analysis, \frac{1}{2 \pi i} \oint e^z z^{-n} \frac{dz}{z} = \frac{1}{n!}, where the integral is taken along a loop around the origin. This formula is also morally right for n a negative integer: n! wants to be $\infty$ when n<0 (because 0 \times (-1)! = 0! =1, so (-1)! should be infinity, and likewise for the other negative integers). So 1/n! wants to be zero for n<0 and, sure enough, this integral has no poles and vanishes in that case.

We can’t use this formula for n not an integer, because z^n has a branch cut and the path of integration would have to cross it. But we can fix that by taking the branch cut of z^n to be along the negative real axis, and drawing our loop out to stretch very far in the negative real direction. Then e^z will be very small at the point where the integration path crosses the real axis, so the branch cut will contribute very little. In the limit, we can define

\displaystyle{ \frac{1}{u!} := \frac{1}{2 \pi i} \int_{\gamma} e^{z} z^{-u} \frac{dz}{z}}

where \gamma is a path that comes in from the negative real direction below the real axis, circles around the origin, and returns to infinity in the negative real direction above the axis. This integral will converge for all complex u

So, how does this do as a definition of 1/u!? Well, it obeys the right recursion. A quick integration by parts gives \int_{\gamma} e^{z} z^{-u+1} dz = - \int_{\gamma} e^z \frac{z^u}{-u} dz = u  \int_{\gamma} e^z z^{-u} dz, so 1/(u-1)! = u/u!.

Let’s take our path \gamma and shrink it towards the negative real axis. As we approach -r from above (for r a positive real), (-r)^{-u} approaches r^{-u} e^{i \pi u}. As we approach -r from below, (-r)^{-u} approaches r^{-u} e^{- i \pi u}. The difference between the two is 2 i r^{-u} \sin(\pi u). So one might think that our integral was equal to \frac{1}{\pi} \int_{0}^{\infty} r^{-u} \sin( \pi u) e^{-r} \frac{dr}{r}.

If you are more careful, you’ll see that this argument only works for \mathrm{Re}(u) < 0; otherwise, the pole at the origin is too wild to permit the limiting process. So we get that our previous definition is equivalent to

\displaystyle{\frac{1}{u!} = \frac{1}{\pi} \sin(\pi u)  \int_{0}^{\infty} r^{-u} e^{-r} \frac{dr}{r}} for \mathrm{Re}(u) < 0.

This is where a person who has seen the \Gamma function defined before will say “well, you’re on the right track, but that sure looks funky.” Writing \Pi for the standard complex extension of the factorial function1, we have \int_{0}^{\infty} r^{-u} e^{-r} \frac{dr}{r} = \Pi(-u-1). So I’ve got the right integral, but it’s being evaluated at the wrong place, and there is this strange extra factor of \frac{1}{\pi} \sin(\pi u) floating around.

But it all works out! We have the functional equation of the \Gamma function:

\displaystyle{\frac{1}{\pi} \sin(\pi u) \Pi(-1-u) = \frac{1}{\Pi(u)}.}

So the integral I have above really is the standard extension, but gotten at from the other side.

One wants to turn this into a proof of the functional equation, but as yet I don’t see how…

1For historical reasons, \Gamma(1+u) = u!. So I’m writing \Pi for the function \Gamma(1+u).


Cosmic VarianceMetaphysics Matters

Chattering classes here in the U.S. have recently been absorbed in discussions that dance around, but never quite address, a question that cuts to the heart of how we think about the basic architecture of reality: are human beings purely material, or something more?

The first skirmish broke out when a major breast-cancer charity, Susan Komen for the Cure (the folks responsible for the ubiquitous pink ribbons), decided to cut their grants to Planned Parenthood, a decision they quickly reversed after facing an enormous public backlash. Planned Parenthood provides a wide variety of women’s health services, including birth control and screening for breast cancer, but is widely associated with abortion services. The Komen leaders offered numerous (mutually contradictory) reasons for their original action, but there is no doubt that their true motive was to end support to a major abortion provider, even if their grants weren’t being used to fund abortions.

Abortion, of course, is a perennial political hot potato, but the other recent kerfuffle focuses on a seemingly less contentious issue: birth control. Catholics, who officially are opposed to birth control of any sort, objected to rules promulgated by the Obama administration, under which birth control would have to be covered by employer-sponsored insurance plans. The original objection seemed to be that Catholic hospitals and other Church-sponsored institutions would essentially be paying for something they though was immoral, in response to which a work-around compromise was quickly adopted. This didn’t satisfy everyone (anyone?), however, and now the ground has shifted to an argument that no individual Catholic employer should be forced to pay for birth-control insurance, whether or not the organization is sponsored by the Church. This position has been staked out by the US Conference of Catholic Bishops, and underlies a new bill proposed by Florida Senator Mark Rubio.

Topics like this are never simple, but they can be especially challenging for a secular democracy. On the one hand, our society is based on religious pluralism. We have freedom of conscience, and try to formulate our laws in such a way that everyone’s rights are protected. But on the other hand, people have incompatible beliefs about fundamental issues. Such beliefs are often of central importance, and the duct tape of political liberalism isn’t always sufficient to hold things together.

When it comes to abortion and birth control, there’s no question that down-and-dirty political and social aspects are front and center. Different political parties want to score points with their constituencies by standing firm in the current culture wars. And there’s also no question that restricting access to contraception and abortion is driven in part (we can argue about how big that part is) by a desire to control women’s sexuality.

But there is also a serious question about human life and the nature of reality. What actually happens when that sperm and ovum get together to make a zygote? Is it just one step of many in an enormously complex chemical reaction that ultimately gives rise to a new person, who is at heart just a complex chemical reaction him-or-herself? Or is it the moment when an immaterial soul, distinct from the material body, first comes into being? Question like this matter — but as a society we hardly ever discuss them, at least not in any serious and open way. As a result, different sides talk past each other, trying to squeeze metaphysical stances into political boxes.

If it were really true that “a human life” was defined by the association of an immaterial soul with a physical body, and that association began at the moment of conception, then making abortion illegal would be perfectly sensible. It would be murder, pure and simple. (Very few people are actually consistent here, believing that mothers who have abortions should be treated like someone who has committed murder; but there are some.) But this view of reality is not true.

Naturalism, which describes human beings in the same physical terms as other objects in the universe, doesn’t actually provide a cut-and-dried answer to the abortion question, because it doesn’t draw a bright line between “a separate living person” and “a collection of cells.” But it provides an utterly different context for addressing the question. Naturalists are generally against murder, but it’s because they recognize certain collections of atoms as “people,” and endow those people with rights and privileges as part of the structure of society. It all comes from distinctions that we human beings ultimately invent, not ones that are handed down from a higher authority. Consequently, the appropriate rules are less clear. A naturalist wants to know whether the purported person can think, feel, react, and so on. They also will balance the interests of the fetus, whatever they may be, against the interests of the mother, who is unquestionably a living and functioning person. It’s perfectly natural that those interests will seem more important than those of a fetus that isn’t even viable outside the womb.

Most everyone, religious believers and naturalists alike, agrees that killing innocent one-year-old children is morally wrong. Consequently, we can happily live together in a society where that kind of action is illegal. But our beliefs about aborting one-month-old embryos are understandably very different. The disagreements about these issues aren’t simply political, they run much deeper than that.

It matters how people think about the world. Political liberalism is a good system, but it only works insofar as the citizens can agree on a core set of values and push cultural/religious differences to the periphery. Naturalism doesn’t answer all the value-oriented questions we might have; it simply provides a sensible framework in which they can be profitably discussed. But between naturalists and non-naturalists, profitable discussion is much more difficult. Which is why we naturalists have to keep pressing, making the best case we can, trying to convince as many people as we can reach that there is only one realm of existence, governed by unbreakable laws, and that we are part of it.


Chad OrzelUpcoming Appearances: Boskone

I've been falling down a little in the area of shameless self-promotion, but I will be at Boskone this coming weekend, where I'll be doing three program items:

Reading: Chad Orzel (Reading), Fri 19:30 - 20:00

This will be a section from the forthcoming book, probably involving Emmy and particle physics. Or possibly William Butler Yeats.

How to Wreck Your Career with Social Media (Special Interest Group) (M), Sat 16:00 - 17:00
What are the new opportunities for public humiliation opened by the Internet? Join this entertaining discussion about authors getting into nasty public spats with reviewers and fans, going off on long unhinged political tirades, sharing a little too much of their unfiltered id, and so on.

I was originally thinking of this as a panel, but they suggested it as a group discussion instead. Lacking any experience with this format, I'm going to hope that somebody's doing one before 4pm on Saturday that sounds interesting, so I can see what exactly I'm supposed to do. Also, suggestions of really entertaining wreckage on social media (blogs, LiveJournal, Twitter, etc.) are welcome in comments.

What Every Dog Should Know About Quantum Physics (Solo Talk), Sun 14:00 - 15:00
Author of How to Teach Physics to Your Dog and How to Teach Relativity to Your Dog, Chad Orzel discusses the basics of quantum physics for two- and four-legged audiences.

This is my public-lecture talk on quantum physics. It's also the last program slot on the schedule, which makes me wonder how many people will still be around to hear it... If you're going to be there, please do stop by.

Read the comments on this post...

Matt StrasslerWhy a Lightweight Higgs is a Sensitive Creature — Part 2

[Note added:  It is official --- as expected, at this year's Chamonix workshop, where the Large Hadron Collider's [LHC's] future is planned out each year, it was decided that the LHC’s energy will be increased by 14% next year (from 3.5 TeV energy per proton and 7 TeV energy per collision in 2010-2011 to 4 TeV per proton and 8 per collision.) Also the time between collisions will remain at 50 nanoseconds.  I’ll have some things to say about the pros and cons of this decision, in particular the challenges for the experiments, over the next few days.]

On Monday last week, I gave you half the explanation as to why a lightweight Higgs particle is a sensitive creature, one that is easily altered by new phenomena — by particles and/or forces that we might not yet know about.  It all had to do with an analogy between a violin string (or a guitar string or a xylophone key) and the properties of the Higgs particle.   Today, on the same webpage as the first half, I have provided the second half of the story. (If you have already read the first half, just look for the boldface words “The Diverse Modes of a Higgs’ Demise”, which separate last week’s prose from the new stuff.)  I’ve also added, for particle physicists and for those laypersons who want to go a little deeper, a short quantitative discussion of my main points.

Also: I will have the honor to be interviewed on Wednesday at 5 p.m. Eastern time, at

http://www.blogtalkradio.com/virtuallyspeaking/2012/02/15/matt-strassler-tom-levenson-virtually-speaking-science

which you can listen to either live or later.  My interviewer, Tom Levenson, is an eminent science journalist who has written fascinating and surprising books on Einstein and on Newton, among others, won awards for his work on television (e.g. NOVA), has a great blog (and also posts here), and is a professor of science writing at MIT.  In short, he’s a bright and interesting dude whom you should consider following on Twitter, or in whatever way floats your boat in the ocean of social media.  For this reason I suspect that the conversation is going to be a lot deeper and more interesting than the average interview, with the interviewer making at least as many interesting comments about the topic as the interviewee.


Filed under: Higgs, LHC Background Info, Public Outreach Tagged: decay, ExoticDecays, Higgs, interviews, LHC, searches

Quantum DiariesImmersion totale d’artistes au CENBG : prise de données

par Nathalie Aubin et Sylvie Massiot, artistes de la compagnie Nukku Matti

Les zéolithes, le pic du spectre, disséquer les gonades, les nématodes, anaérobie, enzymatique, l’étuve agitante, interaction, j’ai du temps de faisceau, le pouième, la désintégration double béta des états excités, la soupe de quark, la magicité du noyau, TeV, KeV… Des mots imaginaires ? Non, le vocabulaire bien spécifique des scientifiques : leur « jargon » comme on dit. Parce que ces mots nous amusent, parce que les phénomènes qu’ils décrivent nous fascinent, et parce qu’ils nous inspirent tout simplement, nous venons de plonger dans l’univers de l’infiniment petit pour la création d’un spectacle sur la structure de la matière et les particules élémentaires. Nous terminons tout juste la deuxième phase : la prise de données…

Les comédiennes interprètent une chanson devant un instrument de physique du CENBG. Photo : Service audiovisuel de Bordeaux 1

Pour ce faire, nous nous sommes immergées, durant cinq jours, dans le monde de la recherche fondamentale et de la physique des particules. Notre expérience s’est déroulée plus précisément au Centre d’Etudes Nucléaires de Bordeaux Gradignan (CENBG). Nous y avons passé une semaine exceptionnelle et nous avons découvert un univers extraordinaire… Christine Marquet, chercheuse au CENBG, nous a ouvert les portes d’un monde jusqu’alors invisible à nos yeux. Ici les chercheurs tentent de percer les mystères par la réflexion, la collaboration, l’échange de savoir, l’invention et la construction d’instruments insolites pour le néophyte. L’ensemble des professionnels s’est mis à notre portée sans compter son temps, ni son énergie pour partager ses connaissances et ses questionnements.

Ainsi, chercheurs, ingénieurs, techniciens nous ont parlé de noyaux exotiques, de mécanique, d’électronique, de chimie chaude, d’astrophysique, de biologie, d’informatique, de particules mais aussi de la place de la recherche dans notre société, de l’importance de la collaboration internationale, de la question de la rentabilité incompatible avec le principe même de la recherche fondamentale. Nous avons collecté beaucoup de données qu’il va nous falloir analyser et trier, mais comme le dit Stéphane, un physicien du CENBG : « le résultat n’est pas toujours là où on l’attend ».

Toutefois cette semaine d’immersion confirme notre envie de transmettre au plus grand nombre l’enthousiasme dans lequel nous avons été plongées. Notre souhait le plus cher est de réussir à traduire dans ce spectacle la même passion, la même curiosité, la même envie de partage que les chercheurs nous ont montrée.


Vidéo de la « Prise de données » (réalisation : Service audiovisuel de l’Université Bordeaux 1)

Pour le moment intitulé « Parce que 12 », ce nouveau spectacle sera en tournée cet automne. Le projet est soutenu par : l’IDDAC, le CENBG, le CNRS/IN2P3, l’Université Bordeaux 1, la Communauté de Communes du Vallon de l’Artolie, la ville de Villenave de Rions. Pour suivre l’évolution du projet, rendez-vous sur la rubrique “Création 2012” de notre site web !

Tommaso DorigoUniversal Extra Dimensions: New DZERO Results

Of the dozens of new physics models which are currently on the market of Standard Model extensions and plug-ins, the ones hypothesizing the existence of additional dimensions of space-time beyond the 3+1 we know about are definitely among the most fascinating.

read more

Chad OrzelLinks for 2012-02-13

  • The Virtuosi: Time Keeps On Slippin'

    Alright, so how do we go about quantifying how "good" a watch is? Well, there seem to be two main things we can test. The first of these is accuracy. That is, how close does this watch come to the actual time (according to some time system)? If the official time is 3:00 pm and my watch claims it is 5:00 am, then it is not very accurate. The second measure of "good-ness" is precision or, in watch parlance, stability. This is essentially a measure of the consistency of the watch. If I have a watch that is consistently off by 5 minutes from the official time, then it is not accurate but it is still stable. In essence, a very consistent watch would be just as good as an accurate one, because we can always just subtract off the known offset.

  • Information Processing: Class and Race

    I don't have anything to add about the content of the post, but these graphs look like they came from a website spoofing confusing academic presentations, not an actual social-science paper. I'm not sure which I like more, the fade-to-invisibility technique used to distinguish some of the data series, or the way the legend implies they've done nine-parameter fits to (effectively) single data points.

  • Why You Need Domain Knowledge

    If you have a gun that runs on compressed air, it would be nice to know how much air you have left wouldn't it? I'm not sure the design was fully thought through. I don't know the story of the gun, but I do know that you shouldn't need to point the barrel toward your face to read a gauge.

Read the comments on this post...

BackreactionDoes science need a universal symbol?

Paul Root Wolpe is on the search for a universal symbol for science. He must be serious, because he has set up a Facebook page. Though one can't say the success of that page is overwhelming.

I'm not sure we really need a universal symbol for science, but I don't think it would harm either. Either way, once the question was in my head, it got me thinking what would make a good symbol for science. Here's what I came up with:


It has the merit that you can put some electron orbits around it, or a galaxy in the middle. Here is somebody else who has made a suggestion. It looks a little illuminati-ish to me though ;o) Something else that crossed my mind is to use an existing symbol, for example ∀ ("for all").

What do you think, would a symbol for science come in handy? Would you put it on your bumper?

John BaezAzimuth on Google Plus (Part 6)

Lately the distribution of hits per hour on this blog has become very fat-tailed. In other words: the readership shoots up immensely now and then. I just noticed today’s statistics:

That spike on the right is what I’m talking about: 338 hits per hour, while before it was hovering in the low 80′s, as usual for the weekend. Why? Someone on Hacker News posted an item saying:

John Baez will give his Google Talk tomorrow in the form of a robot.

That’s true! If you’re near Silicon Valley on Monday the 13th and you want to see me in the form of a robot, come to the Google campus and listen to my talk Energy, the Environment and What We Can Do.

It starts at 4 pm in the Paramaribo Room (Building 42, Floor 2). You’ll need to check in 15 minutes before that at the main visitor’s lounge in Building 43, and someone will escort you to the talk.

But if you can’t attend, don’t worry! A video will appear on YouTube, and I’ll point you to it when it does.

I tested out the robot a few days ago from a hotel room in Australia—it’s a strange sensation! Suzanne Brocato showed me the ropes. To talk to me easily, she lowered my ‘head’ until I was just 4 feet tall. “You’re so short!” she laughed. I rolled around the offices of Anybot and met the receptionist, who was also in the form of a robot. Then we went to the office of the CEO, Trevor Blackwell, and planned out my talk a little. I need to practice more today.

But why did someone at Hacker News post that comment just then? I suspect it’s because I reminded people about my talk on Google+ last night.

The fat-tailed distribution of blog hits is also happening at the scale of days, not just hours:

The spikes happen when I talk about a ‘hot topic’. January 27th was my biggest day so far. Slashdot discovered my post about the Elsevier boycott, and send 3468 readers my way. But a total 6499 people viewed that post, so a bunch must have come from other sources.

January 31st was also big: 3271 people came to read about The Faculty of 1000. 2140 of them were sent over by Hacker News.

If I were trying to make money from advertising on this blog, I’d be pushed toward more posts about hot topics. Forget the mind-bending articles on quantropy, packed with complicated equations!

But as it is, I’m trying to do some mixture of having fun, figuring out stuff, and getting people to save the planet. (Open access publishing fits into that mandate: it’s tragic how climate crackpots post on popular blogs while experts on climate change publish their papers in journals hidden from public view!) So, I don’t want to maximize readership: what matters more is getting people to do good stuff.

Do you have any suggestions on how I could do this better, while still being me? I’m not going to get a personality transplant, so there are limits on what I’ll do.

One good idea would be to make sure every post on a ‘hot topic’ offers readers something they can do now.

Hmm, readership is still spiking:

But enough of this navel-gazing! Here are some recent Azimuth articles about energy on Google+.

Energy

1) In his State of the Union speech, Obama talked a lot about energy:

We’ve subsidized oil companies for a century. That’s long enough. It’s time to end the taxpayer giveaways to an industry that rarely has been more profitable, and double-down on a clean energy industry that never has been more promising.

He acknowledged that differences on Capitol Hill are “too deep right now” to pass a comprehensive climate bill, but he added that “there’s no reason why Congress shouldn’t at least set a clean-energy standard that creates a market for innovation.”

However, lest anyone think he actually wants to stop global warming, he also pledged “to open more than 75 percent of our potential offshore oil and gas resources.”

2) This paper claims a ‘phase change’ hit the oil markets around 2005:

• James Murray and David King, Climate policy: Oil’s tipping point has passed, Nature 481 (2011), 433–435.


They write:

In 2005, global production of regular crude oil reached about 72 million barrels per day. From then on, production capacity seems to have hit a ceiling at 75 million barrels per day. A plot of prices against production from 1998 to today shows this dramatic transition, from a time when supply could respond elastically to rising prices caused by increased demand, to when it could not (see ‘Phase shift’). As a result, prices swing wildly in response to small changes in demand. Other people have remarked on this step change in the economics of oil around the year 2005, but the point needs to be lodged more firmly in the minds of policy-makers.

3) Help out the famous climate blogger Joe Romm! He asks: What will the U.S. energy mix look like in 2050 if we cut CO2 emissions 80%?

How much total energy is consumed in 2050… How much coal, oil, and natural gas is being consumed (with carbon capture and storage of some coal and gas if you want to consider that)? What’s the price of oil? How much of our power is provided by nuclear power? How much by solar PV and how much by concentrated solar thermal? How much from wind power? How much from biomass? How much from other forms of renewable energy? What is the vehicle fleet like? How much electric? How much next-generation biofuels?

As he notes, there are lots of studies on these issues. Point him to the best ones!

4) Due to plunging prices for components, solar power prices in Germany dropped by half in the last 5 years. Now solar generates electricity at levels only slightly above what consumers pay. The subsidies will disappear entirely within a few years, when solar will be as cheap as conventional fossil fuels. Germany has added 14,000 megawatts capacity in the last 2 years and now has 24,000 megawatts in total—enough green electricity to meet nearly 4% the country’s power demand. That is expected to rise to 10% by 2020. Germany now has almost 10 times more installed capacity than the United States.

That’s all great—but, umm, what about the other 90%? What’s their long-term plan? Will they keep using coal-fired power plants? Will they buy more nuclear power from France?

In May 2011, Britain claimed it would halve carbon emissions by 2025. Is Germany making equally bold claims or not?
Of course what matters is deeds, not words, but I’m curious.

5) Stephen Lacey presents some interesting charts showing the progress and problems with sustainability in the US. For example, there’s been a striking drop in how much energy is being used per dollar of GNP:


Sorry for the archaic ‘British Thermal Units’: we no longer have a king, but for some reason the U.S. failed to throw off the old British system of measurement. A BTU is a bit more than a kilojoule.

Despite these dramatic changes, Lacey says “we waste around 85% of the energy produced in the U.S.” But he doesn’t say how that number was arrived at. Does anyone know?

6) The American Council for an Energy-Efficient Economy (ACEEE) has a new report called The Long-Term Energy Efficiency Potential: What the Evidence Suggests. It describes some scenarios, including one where the US encourages a greater level of productive investments in energy efficiency so that by the year 2050, it reduces overall energy consumption by 40 to 60 percent. I’m very interested in how much efficiency can help. Some, but not all, of the improvements will be eaten up by the rebound effect.


Scott AaronsonSafari photos from Kenya

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

(Credit for most of the photos: Dana)

I was going to write a whole long essay about

  • the differences between going to the zoo and visiting an ancestral environment of humanity, where elephants have grazed for millions of years;
  • the weird sense of familiarity, as if you’re seeing how the surface of the earth is “supposed” to look, how it did look before humans started converting it into KFCs and parking lots;
  • how to tell whether an elephant charging your jeep is serious about wanting to trample you or, much more likely, just warning you to go away (apparently, it has to do with whether its ears are straight back or flapping);
  • the “airport” at Lake Naivasha (a strip of dirt in a grassy field filled with zebras, and a guy on a bicycle who shoos the zebras off the strip before a plane lands);
  • Britain’s failure, to this day, to issue any sort of apology for its detention, torture, and murder of tens of thousands of Kenyans during the waning years of its colonial rule in the 1950s;
  • the near-destruction by poaching, over the last century, of many of the majestic animal populations you see above;
  • the heroism of Richard Leakey (past director of the Kenya Wildlife Service) in overcoming decades of bureaucratic inertia to initiate a crackdown, where rangers were authorized to “poach the poachers,” shooting them on sight (!);
  • how, after Leakey almost-singlehandedly saved Kenya’s wild elephants, he lost both of his legs when his plane crashed (widely suspected to be due to sabotage), and was forced from his job months later;
  • the benefits of safari tourism in creating a serious economic incentive for conservation, but also the drawbacks (e.g., all the jeeps making it harder for the cheetahs to hunt);
  • the large, obvious, anything-but-”theoretical” changes being wrought by global warming on the rainfall in Kenya’s game parks (which changes are killing the trees, thereby eliminating the lions’ hiding places and making it harder for them to hunt—hey, at least the zebras are happy);
  • the Maasais’ innovative uses for cow dung; the resulting immature jokes on my part (homeowner to roofer: “this roof you sold me is shit!”);
  • my growing fascination, over the course of the trip, with the lesser-known corners of Mammalia (elands, dik-diks, kudus, waterbucks, topis, rock hyraxes); how this might mirror my fascination with lesser-known complexity classes like AWPP, QMA(2)/qpoly, SBP, C=P, and BPPpath;
  • how parts of the African savannah have better cellphone reception than my office in Stata;
  • how it’s indeed possible to catch up on Jon Stewart and The Big Bang Theory over wifi, from a tent in the Maasai Mara, while hippos bellow loudly in the river below, and elephants graze and crocodiles sun themselves on the other side.

But then I never got around to writing that essay.  So enjoy the photos, and ask in the comments if you want me to say something else.

February 12, 2012

Jordan EllenbergIs there a noncommutative Siegel’s Lemma?

Let f be the smallest function satisfying the following:

Suppose given two matrices A and B in SL_3(Z), with all entries at most N.  If there is a word w(A,B,A^{-1},B^{-1}) which vanishes in SL_3(Z), then there is a word w’(A,B,A^{-1},B^{-1}) of length at most f(N) which vanishes in SL_3(Z).

What are the asymptotics of f(N)?

The reason for the title is that, if SL_3(Z) is replaced by Z^n, this is Siegel’s lemma:  if two (or, for that matter, k) vectors in [-N..N]^n are linearly dependent, then there is a linear dependency whose height is polynomial in N.  (Here k and n are constants and N is growing.)

I don’t have any particular need to know this — the question came up in conversation at the very stimulating MSRI Thin Groups workshop just concluded.  Sarnak’s notes are an excellent guide to the topics discussed there.

 

 

 


Chad OrzelSyncretic Pre-Schooler Blogging

We send SteelyKid to preschool at the Jewish Community Center in Schenectady, because when we looked at day care programs back in the day, they had the one we liked best. This is a mixed blessing in a number of ways-- they close for a lot of religious holidays when nothing else closes, creating some awkwardness with child care and our jobs. On the plus side, though, it's a chance to learn about another culture, and as an extra bonus, most of what we learn is filtered through SteelyKid, making it extra cute.

For example, on the way home Friday, she was chattering quietly to herself in the back seat, and when I opened the door, announced "I was telling a story about Satsuki!"

"Satsuki from the Totoro movie?" She's a by fan of My Neighbor Totoro, which we have on DVD.

"Yeah. She was asking a question."

"What was the question?"

"Well, she was asking about... about... God gave the world to Abraham!"

"Really?"

"Yeah, He gave Abraham the whole world. And Jacob, too. God gave the world to Jacob, because he was a good boy."

A little later in the evening, at the dinner table, she explained that this picture was a painting of the whole world:

sm_world_picture.jpg

Read the rest of this post... | Read the comments on this post...

ResonaancesHiggs: stronger and more exciting

Today the CMS and ATLAS collaborations dumped into public pages a dozen publications describing the Higgs searches in the 2011 LHC data. In the first approximation, these are the same results that were presented on December 13. But there is one surprise...

The CMS collaboration had every reason to think that life was unfair. For the last round of Higgs searches they made significantly more effort and analyzed more possible signatures than ATLAS. The latter updated only 2 channels to the full dataset, and in principle had worse sensitivity in the H→ZZ*→4l channel (due to slightly higher pT thresholds in the analysis). In spite of that, the significance of the Higgs-like excess near 125 GeV was much weaker in CMS than in ATLAS. Naturally, the CMS researchers have spent the last 2 months scouring their drawers for strayed Higgs events. And they found.

New interesting events are reported in the H → γγ channel. Compared to the December 13 presentation, CMS added a new category of events which, apart from 2 photons, contain 2 energetic jets in the forward (closer to the beam pipe) region of the detector. Such events could arise in the so-called vector boson fusion (VBF) process, where each of the 2 colliding quarks emits a W or Z boson which coalesce to create a Higgs boson (right graph). The 2 original quark get deflected and may be seen in a detector as two forward jets. On the other hand in gluon fusion (left graph), which is the dominant Higgs production process at the LHC, the 2 colliding gluons "vanish" and the final state rarely contains 2 forward jets. Also background processes are less likely to produce 2 photons in association with 2 such jets. Hence, by selecting diphoton events with 2 forward jets we can probe a distinct Higgs production process, with less signal (the VBF cross section is 10 times smaller than the gluon fusion one), but also with less background.

Now, in the VBF class CMS finds 7 diphoton events in a 1-GeV bin at the invariant mass 124 GeV, where only about 2 events would be expected from non-Higgs background. By itself it would be nothing, but together with the rest of events in the diphoton and 4-lepton channels it provides another support for the existence of the Higgs boson in the mass range 124-126 GeV. All in all, the local significance of the excess near 125 GeV in the combined CMS analysis is now over 3 sigma, very similar to that of ATLAS. While many small improvements have been made, my feeling is that significance was pumped up mostly by these additional VBF events.













But there is something intriguing here. Now in both ATLAS and CMS the best fit of the Higgs rate in the H → γγ channel alone is about twice the Standard Model rate, with the standard rate being over 1 sigma away in both cases. Actually, with the present amount of data CMS would not expect to see any VBF events, as rate predicted by the Standard Model is too small (see below that in this channel their fit is almost 4 times the standard rate, although with a large error). Could it be that we're seeing a non-Standard-Model Higgs boson with an enhanced decay rate and/or enhanced VBF production cross section? Of course, it is far too early for jumping into conclusions: the errors are still large and we may easily be observing an upward fluctuation. Besides,the combination of all channels doesn't show any dramatic enhancement of the Higgs rate. In any case we're free to speculate while waiting for more data (and a word from ATLAS on the VBF events).















See also Matt for more details and more caution.

Quantum DiariesMaîtriser la complexité

Je reviens tout juste de la réunion annuelle du Forum économique mondial, à Davos. Durant ces quelques jours, je me suis attaché à faire comprendre que la science devrait occuper dans l’agenda politique et économique une place bien plus importante qu’elle ne le fait actuellement. C’est la deuxième fois seulement que je participe au Forum, mais j’ai l’impression que le message commence à être entendu. Cette année, j’ai insisté sur le fait qu’il est important d’établir des liens plus étroits entre les questions scientifiques évoquées au cours de la réunion et les discussions politiques, et je m’efforcerai de promouvoir cette idée en vue de la prochaine réunion du Forum.

La science est un sujet complexe. C’est ainsi. Mais il est essentiel que chacun l’aborde de manière constructive. C’est particulièrement vrai pour les hommes politiques et les chefs d’entreprise présents à Davos, dont les décisions en rapport avec des questions scientifiques peuvent influencer bien des choses, du bien-être de nos enfants à l’avenir de la planète. Il est fondamental que ces décisions soient prises de manière informée et rationnelle.

Le défi pour la science, c’est que nous vivons dans un monde où l’on se doit de connaître Shakespeare, Molière ou Goethe, mais où l’on peut avouer sans honte ne rien savoir de Faraday, de Pasteur ou d’Einstein. Cela n’a pas toujours été le cas et les choses pourraient être différentes. Aujourd’hui, la tendance est à l’indifférence, voire à l’hostilité envers la science. C’est une tendance dangereuse pour tous, et il est du devoir de la communauté scientifique d’y remédier.

Il n’y a encore pas si longtemps, la science faisait partie intégrante de la société. Elle faisait la une des journaux et on en parlait autant que des matches de football. Au début du XXe siècle, les découvertes d’Einstein étaient illustrées par des dessins de presse, et, dans les années 60, la science envahissait l’imaginaire populaire, en grande partie grâce au programme Apollo de la NASA. Mais, déjà, l’écart entre la science et la société se creusait, et cette tendance n’a fait que s’accentuer, laissant la société mal préparée pour prendre des décisions fondées scientifiquement.

Le changement climatique et l’énergie sont les deux grands défis auxquels la société doit aujourd’hui faire face. Ce sont là deux questions scientifiques et politiques extrêmement complexes. Le climat est en train de changer. Cela ne fait aucun doute, tout comme le fait que l’activité humaine y est pour quelque chose. Et pourtant, dans la sphère publique, la question reste débattue De la même façon, on ne peut que constater que les énergies renouvelables ne suffisent pas à l’heure actuelle pour satisfaire les besoins toujours croissants de la planète. Cela ne veut pas dire qu’elles n’ont pas leur place. Bien au contraire, et cette place prendra de l’ampleur au fil des ans. Mais il faudra du temps avant de pouvoir répondre à la demande. La société est-elle armée pour prendre les difficiles décisions qui s’imposent sur des questions d’importance planétaire comme celles-ci ? Je ne le pense pas.

Sur le plan individuel, un grand nombre de sujets laissent les citoyens perplexes, ce qui les amène à prendre des décisions en étant mal informés ; des décisions qui sont littéralement d’importance vitale : cela peut concerner la maladie de la vache folle, la peur du vaccin ROR, l’innocuité des téléphones portables, pour ne citer que ces quelques exemples.

Au CERN également nous avons bien sûr expérimenté ce phénomène. Lorsque le LHC a démarré en 2008, le monde a eu peur du trou noir. Une poignée d’individus prétendaient que notre accélérateur vedette allait créer un trou noir qui engloutirait notre planète. L’idée s’est répandue sur les réseaux sociaux et a été également largement reprise dans les médias traditionnels, dont un grand nombre ont cédé à la facilité, laissant de côté le code d’éthique journalistique et préférant exploiter l’aspect grotesque du scénario. Malheureusement, la science a trop longtemps négligé la société, et nombreux sont ceux qui n’ont pu voir tout ce que cela avait de risible. On a même signalé que des écoles avaient fermé le jour de l’inauguration de la machine pour permettre aux enfants d’être auprès de leurs parents, au cas où. Et tout cela, sur le témoignage d’un homme qui, interrogé à la télévision, a expliqué que, puisque le LHC allait peut-être détruire l’Univers, ou peut-être pas, la probabilité d’assister à un désastre était d’une chance sur deux. On pourrait en rire, si ce n’était pas si dramatique.

Que peuvent faire les scientifiques ? Selon moi, bien des choses Sur le plan institutionnel, des changements s’amorcent. Dans la toute nouvelle Blavatnik School of Government de l’Université d’Oxford, par exemple, la science fait partie intégrante des cours de politique publique. Nous devons utiliser des projets scientifiques passionnants comme le LHC pour amener les gens à s’intéresser à la science, pas uniquement par des articles scientifiques, mais aussi via de nouveaux canaux, comme le programme de résidence artistique qui vient d’être lancé au CERN. Et les scientifiques qui ont de l’influence doivent utiliser cette influence pour façonner le débat politique dans les capitales et dans des endroits comme Davos.

Depuis plusieurs années, le CERN privilégie l’ouverture, profitant de la mise en lumière du LHC pour dialoguer davantage avec le plus grand nombre (décideurs, population locale, grand public). Nos activités sont ainsi traitées de manière responsable et recommencent à faire la une des médias et à être suivies par le grand public. Parfois, les faits ne sont pas relatés exactement comme nous le voudrions, mais il est question de science, et c’est là l’essentiel.

Lorsque le LHC a démarré, le monde a continué d’exister, et un journal au moins n’a pas hésité à dire que le LHC serait le nouvel Apollo et conduirait toute une génération à s’intéresser à la science. Bien sûr, ce n’est pas à prendre au pied de la lettre, mais ce genre de commentaire a un effet positif. Plus récemment, un autre journal indiquait que la physique possède ce petit quelque chose en plus, cette qualité insaisissable qui le met dans l’air du temps.

La science dans son ensemble doit en profiter et faire en sorte que l’intérêt pour le LHC ne soit pas un simple feu de paille médiatique, et que les échanges avec le grand public se poursuivent. En tant que scientifiques, nous le devons à la planète. Nous devons aider les gens à maîtriser la complexité de leur vie quotidienne, qui dépend de questions scientifiques. Dans douze mois, c’est le message que je transmettrai à Davos.

Rolf Heuer

David Hoggpublishing implementations

Foreman-Mackey and I got very close today to finishing a note for arXiv on his super-fast, parallel, ensemble sampler that we have been using in a range of projects (see recent papers by Lang and Bovy). We will put it up as an arXiv-only paper, which is something I love to do. But the fact that this is not a typical or normal kind of publication—for example, there is nowhere that it could appear in the peer-reviewed literature—is crazy: A great implementation of a good algorithm that enables lots of science is itself an extremely important contribution to science, just like a telescope or a camera or a spectrograph. How can we make these things count like publications? And how can we change the language we all use that separates these contributions out into categories that are always contrasted with the category "science"? Enough spouting; watch the arXiv this week for some block-busting code.

February 11, 2012

Chad OrzelLinks for 2012-02-11

  • Jeremy Lin, Landry Fields unveil nerdiest handshake in NBA history - San Jose Mercury News

    Jeremy Lin and Landry Fields of the New York Knicks may comprise the most intelligent starting backcourt in NBA history. It's certainly hard to top a duo that boasts college degrees from Harvard (Lin) and Stanford (Fields). So it's not surprising when Lin and Fields unveiled what has to be the nerdiest pre-game handshake in league history. The choreographed skit features the two skimming through an imaginary book, taking off their glasses and then placing them inside pocket protectors.

  • Confessions of a Community College Dean: "You're Assuming We Thought it Through"

    A couple of weeks ago I had the chance to discuss a proposed and relatively dramatic policy change with someone fairly high in state government. I objected to the change with some vigor, and outlined several objections that I thought added up to a compelling case. She listened politely, and then gave an answer for which I hadn't prepared. "You're assuming we thought it through." Well, yes. At least I would have hoped so.

Read the comments on this post...

Tommaso DorigoSticks and Stones May Break Your Bones, But Words Will Really Put You In Trouble

That is what Hamza Kashgari, a 23 years old reporter and poet from Saudi Arabia, is realizing the hard way. He used twitter to write a poetic "dialogue" with prophet Muhammad, and this was enough to get him condemned to death by the salafi sheikhs. Hamza tried to escape, but was arrested in Malaysia. He now risks beheading for his words.

read more

Terence Tao254B, Notes 4: The Bourgain-Gamburd expansion machine

We have now seen two ways to construct expander Cayley graphs {Cay(G,S)}. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group {G} with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)

for some {n = O(\log |G|)}, where {\mu} is the uniform probability measure on the generating set {S}.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group {G} is if the random walk gets trapped in some proper subgroup {H} of {G} (or perhaps in some coset {xH} of such a subgroup), so that {\mu^{(n)}(xH)} remains large for some moderately large {n}. Note that

\displaystyle  \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,

since {\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}, {H = (H x^{-1}) \cdot (xH)}, and {\mu^{(n)}} is symmetric. By iterating this observation, we seethat if {\mu^{(n)}(xH)} is too large (e.g. of size {|G|^{-o(1)}} for some {n} comparable to {\log |G|}), then it is not possible for the random walk {\mu^{(n)}} to converge to the uniform distribution in time {O(\log |G|)}, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group {H}. Recall that a {K}-approximate group is a subset {H} of a group {G} which is symmetric, contains the identity, and is such that {H \cdot H} can be covered by at most {K} left-translates (or equivalently, right-translates) of {H}. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if {\mu^{(n)}(xH)} is too large for some coset {xH} of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let {G} be a group, let {\mu} be a finitely supported probability measure on {G} which is symmetric (thus {\nu(g)=\nu(g^{-1})} for all {g \in G}), and let {K \geq 1}. Then one of the following statements hold:

  • (i) (Flattening) One has {\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}.
  • (ii) (Concentration in an approximate group) There exists an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when {\mu} is the uniform distribution on some set {A}), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group {G} in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of {G} are trapped inside genuine proper subgroups of {G} (or, contrapositively, medium-sized sets that generate the entire group {G} cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that {G} is a finite group, that {S \subseteq G} is a symmetric set of {k} generators, and that there are constants {0 < \kappa < 1 < \Lambda} with the following properties.

  1. (Quasirandomness). The smallest dimension of a nontrivial representation {\rho: G \rightarrow GL_d({\bf C})} of {G} is at least {|G|^{\kappa}};
  2. (Product theorem). For all {\delta > 0} there is some {\delta' = \delta'(\delta) > 0} such that the following is true. If {H \subseteq G} is a {|G|^{\delta'}}-approximate subgroup with {|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}} then {H} generates a proper subgroup of {G};
  3. (Non-concentration estimate). There is some even number {n \leq \Lambda\log |G|} such that

    \displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa},

    where the supremum is over all proper subgroups {H < G}.

Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon > 0} depending only on {k,\kappa, \Lambda}, and the function {\delta'(\cdot )} (and this constant {\epsilon} is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in {SL_2(F_p)} for prime {p}. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

— 1. The Balog-Szemerédi-Gowers lemma —

The Balog-Szemerédi-Gowers lemma (Lemma 1) is ostensibly a statement about group structure, but the main tool in its proof is a remarkable graph-theoretic lemma (also known as the Balog-Szemerédi-Gowers lemma) that allows one to upgrade a “statistical” structure (a structure which is only valid a small fraction of the time, say 1% of the time) to a “complete” structure (one which is valid 100% of the time), by shrinking the size of the structure slightly (and in particular, with losses of polynomial type, as opposed to exponential or worse). This is in contrast to other structure-improving results (such as Ramsey’s theorem, Szemerédi’s theorem, or Freiman’s theorem), which are qualitatively similar in spirit, but have much worse quantitative bounds (though there is some hope in the case of Freiman’s theorem to only lose polynomial bounds with some improvement of existing arguments).

As we shall see later, the property of {\|\nu*\nu\|_{\ell^2(G)}} being large is a statistical assertion about {\nu} (it asserts that {\nu*\nu} collides with itself somewhat often), whereas approximate groups {H} represent a more complete sort of structure (all products of {H \cdot H} are trapped in a small set, whereas only many of the products in {\nu * \nu} are so constrained). The graph-theoretic Balog-Szemerédi lemma is the key to moving from the former type of structure to the latter with only polynomial losses.

We need some notation. Define a bipartite graph {G = G(A,B,E)} to be a graph whose vertex set {V := A \cup B} is partitioned into two non-empty sets {A, B}, and the edge set {E} consists only of edges between {A} and {B}. If a finite bipartite graph {G = G(A,B,E)} is dense in the sense that its edge density {|E|/|A||B|} is large, then for many vertices {a \in A} and {b \in B}, {a} and {b} are connected by a path of length one (i.e. an edge). It is thus intuitive that many pairs of vertices {a \in A} and {a' \in A} will be connected by many paths of length two. Perhaps surprisingly, one can upgrade “many pairs” here to “almost all pairs”, provided that one is willing to shrink the set {A} slightly. More precisely, one has

Lemma 3 (Balog-Szemerédi-Gowers lemma: paths of length two) Let {G(A,B,E)} be a finite bipartite graph with {|E| \geq |A| |B| / K}. Let {\epsilon > 0}. Then there exists a subset {A'} of {A} with {|A'| \geq \frac{|A|}{\sqrt{2} K}} such that at least {(1-\epsilon)|A'|^2} of the pairs {(a,a') \in A' \times A'} are such that {a,a'} are connected by at least {\frac{\epsilon}{2K^2} |B|} paths of length two (i.e. there exists at least {\frac{\epsilon}{2K^2} |B|} vertices {b \in B} such that {\{a,b\}, \{a',b\}} both lie in {E}).

Remark 1 It is not possible to remove the {\epsilon} entirely from this lemma; see Exercise 6.4.2 of my book with Van Vu for a counterexample (involving Hamming balls).

Proof: The idea here is to use a probabilistic construction, picking {A'} to be a neighbourhood of a randomly selected element {b} of {B}. The rationale here is that if a pair {a,a'} of vertices in {A} are not connected by many paths of length two, then they are unlikely to lie in the same neighbourhood, and so are unlikely to “wreck” the construction.

We turn to the details. Let {b \in B} be chosen uniformly at random, and let {A' := \{ a \in A: (a,b) \in E \}} be the neighbourhood of {b}. Observe that the expected size of {A'} is

\displaystyle {\bf E} |A'| = \frac{1}{|B|} |E| \geq \frac{|A|}{K}.

By Cauchy-Schwarz, we conclude in particular that

\displaystyle  {\bf E} |A'|^2 \geq \frac{|A|^2}{K^2}. \ \ \ \ \ (2)

Now, call a pair {(a,a')} bad if it is connected by fewer than {\frac{\epsilon |B|}{2K^2}} paths of length two, and let {N} be the number of bad pairs {(a,a')} in {A' \times A'}. We consider the quantity {{\bf E} N}. Observe that if {(a,a')} is a bad pair in {A \times A}, then there are at most {\frac{\epsilon |B|}{2K^2}} values of {b} for which {a} and {a'} will both lie in {A'}, and so this bad pair contributes at most {\frac{\epsilon}{2K^2}} to the expectation. Since there are at most {|A|^2} bad pairs, we conclude that

\displaystyle  {\bf E} N \leq \frac{\epsilon |A|^2}{2K^2}.

Combining this with (2), we see that

\displaystyle  {\bf E} |A'|^2 - \frac{N}{\epsilon} - \frac{|A|^2}{2K^2} \geq 0.

In particular, there exists a choice of {b} for which the expression on the left-hand side is non-negative. This implies that

\displaystyle  N \leq \epsilon |A'|^2

and

\displaystyle  |A'|^2 \geq \frac{|A|^2}{2K^2}

and the claim follows. \Box

Given that almost all pairs {a,a'} in {A'} are joined by many paths of length two, it is then plausible that almost all pairs {a \in A'}, {b \in B'} are joined by many paths of length three, for some large subset {B'} of {B}. Remarkably, one can now upgrade “almost all” pairs here to all pairs:

Lemma 4 (Balog-Szemerédi-Gowers lemma: paths of length three) Let {G(A,B,E)} be a finite bipartite graph with {|E| \geq |A| |B| / K}. Then there exists subsets {A', B'} of {A, B} respectively with {|A'| \gg K^{-O(1)} |A|} and {|B'| \gg K^{-O(1)} |B|}, such that for every {a \in A} and {b \in B}, {a} and {b} are joined by {\gg K^{-O(1)} |A| |B|} paths of length three.

Remark 2 A lemma similar to this was first established by Balog and Szemerédi, as a consequence of the Szemereédi regularity lemma. However, as a consequence of using that lemma, the polynomial bounds {K^{-O(1)}} in the above lemma had to be replaced by much worse bounds (of tower-exponential type in {K}), which turns out to be far too weak for the purposes of establishing expansion.

Proof: The idea is to first prune a few “unpopular” vertices from {A} and {B} and then apply the preceding lemma.

Let {A_1} be the vertices in {A} of degree at least {|B|/2K}, and let {E_1} be the edges connecting {A_1} and {B}. Note that the vertices in {A \backslash A_1} are connected to a total of at most {|A| |B|/2K} edges, and so {|E_1| \geq |A| |B|/2K \geq |A_1| |B|/2K}. Since {|E_1| \leq |A_1| |B|}, we conclude in particular that {|A_1| \ge |A|/2K}.

Let {\epsilon > 0} be a sufficiently small quantity (depending on {K}) to be chosen later. Applying Lemma 3, one can find a subset {A_2} of {A_1} of cardinality {|A_2| \gg |A_1|/K \gg |A|/K^2} such that at most {\epsilon |A_2|^2} of the pairs {(a,a') \in A_2 \times A_2} are bad in the sense that they are connected by {\gg \epsilon/K^2 |B|} paths of length two.

Let {A'} be those vertices {a} in {A_2} for which there are at most {\sqrt{\epsilon} |A_2|} elements {a'} of {A_2} for which {(a,a')} is bad. By Markov’s inequality, {A'} consists of all but at most {\sqrt{\epsilon} |A_2|} elements of {A_2}.

Let {E_2} be the edges connecting {A_2} with {B}. Since each vertex in {A_2} has degree at least {|B|/2K}, one has

\displaystyle  |E_2| \geq |A_2| |B| / 2K \gg |A| |B| / K^3.

We may thus find a subset {B'} of {B} of cardinality {|B'| \gg |B|/K^3} such that each {b \in B'} is adjacent to {\gg |A|/K^3} elements of {A_2}.

Now let {a \in A'} and {b \in B'}. We know that {b} is adjacent to {\gg |A|/K^3} elements {a'} of {A_2}, and that at most {\sqrt{\epsilon} |A_2|} of these elements are such that {(a,a')} is bad. If we choose {\epsilon} to be a sufficiently small multiple of {1/K^6}, we conclude that there are {\gg |A|/K^3} elements {a'} which are adjacent to {b} and for which {(a,a')} is not bad. One thus has {\gg (|A|/K^3) (\epsilon/K^2) |B| \gg |A| |B| / K^{11}} paths of length three connecting {a} to {b}, and the claim follows. \Box

The exponents in {K} here can be improved slightly, but we will not attempt to obtain the optimal numerology here.

Remark 3 The above results are analogous to a phenomenon in additive combinatorics, namely that a “1%-structured” set (such as a small density subset of a group) can often be upgraded to a “99%-structured” set (such as the complement of a small density subset of a group) by applying a single “convolution” or “sumset” operation, and then upgraded further to a “100%-structured” set (such as a genuine group) by applying a further convolution or sumset operation. (This is basically why, for instance, it is known that almost all even natural numbers are the sum of two primes, and all but finitely many odd natural numbers are the sum of three primes; but it is not known whether all but finitely many even natural numbers are the sum of two primes.)

Exercise 1 (Weighted Balog-Szemerédi-Gowers theorem) Let {(X,\mu)} and {(Y,\nu)} be probability spaces, and let {E \subset X \times Y} have measure {\mu \times \nu(E) \geq 1/K} for some {K \geq 1}.

  • (i) Show that for any {\epsilon > 0}, there exists a subset {X'} of {X} of measure {\mu(X') \geq \frac{1}{\sqrt{2}K}} such that

    \displaystyle  \mu \times \mu( \{ (x,x') \in X' \times X':

    \displaystyle  \int_Y 1_E(x,y) 1_E(x',y)\ d\nu(y) < \frac{\epsilon}{2K^2} \} ) \leq \epsilon \mu(X')^2.

  • (ii) Show that there exists subsets {X', Y'} of {X,Y} of measure {\mu(X') \gg K^{-O(1)}} and {\nu(Y') \gg K^{-O(1)}} such that

    \displaystyle  \int_X \int_Y 1_E(x,y') 1_E(x',y') 1_E(x',y)\ d\mu(x') d\nu(y') \gg K^{-O(1)}

    for all {x \in X'} and {y \in Y'}.

Exercise 2 (99% Balog-Szemerédi theorem) Let {G(A,B,E)} be a finite bipartite graph with {|E| \geq (1-\epsilon) |A| |B|}.

  • (i) Show that there exists a subset {A'} of {A} of size {|A'| \geq (1-O(\sqrt{\epsilon})) |A|} such that for every {a,a' \in A'}, {a} and {a'} are connected by at least {(1-O(\sqrt{\epsilon})) |B|} paths of length {2}. (Hint: select {A'} to be those vertices in {A'} that are connected to “almost all” the vertices in {B}.)
  • (ii) Show that there also exists a subset {B'} of {B} of size {|B'| \geq (1-O(\sqrt{\epsilon})) |B|} such that for every {a \in A'} and {b \in B'}, {a} and {b} are connected by at least {(1-O(\sqrt{\epsilon})) |A| |B|} paths of length {3}.

We now apply the graph-theoretic lemma to the group context. The main idea here is to show that various sets (e.g. product sets {A \cdot B}) are small by showing that they are in the high-multiplicity region of some convolution (e.g. {1_{A_1} * \ldots * 1_{A_k}}), or equivalently that elements {g} of such sets have many representations as a product {g = a_1 \ldots a_k} with {a_1 \in A_1, \ldots, a_k \in A_k}. One can then use Markov’s inequality and the trivial identity {\| 1_{A_1} * \ldots * 1_{A_k} \|_{\ell^1(G)} = |A_1| \ldots |A_k|} to get usable size bounds on such sets.

Corollary 5 (Balog-Szemerédi lemma, product set form) let {A, B} be finite non-empty subsets of a group {G = (G,\cdot)}, and suppose that

\displaystyle  \|1_A * 1_B \|_{\ell^2(G)} \geq |A|^{3/4} |B|^{3/4}/K

for some {K \geq 1}. (This hypothesis should be compared with the upper bound

\displaystyle  \|1_A * 1_B \|_{\ell^2(G)} \leq \|1_A\|_{\ell^{4/3}(G)} \|1_B\|_{\ell^{4/3}(G)} = |A|^{3/4} |B|^{3/4}

arising from Young’s inequality.) Then there exists subsets {A', B'} of {A, B} respectively with {|A'| \gg K^{-O(1)} |A|} and {|B'| \gg K^{-O(1)} |B|} with {|A' \cdot B'| \ll K^{O(1)} |A|^{1/2} |B|^{1/2}} and {|A' \cdot (A')^{-1}| \ll K^{O(1)} |A|}.

The quantity {\|1_A *1_B\|_{\ell^2(G)}^2} (or equivalently, the number of solutions to the equation {ab=a'b'} with {a,a' \in A} and {b,b' \in B}) is also known as the multiplicative energy of {A} and {B}, and is sometimes denoted {E(A,B)} in the literature.

Proof: By hypothesis, we have

\displaystyle  \sum_{(a,b) \in A \times B} 1_A * 1_B(ab) = \|1_A * 1_B \|_{\ell^2(G)}^2 \geq |A|^{3/2} |B|^{3/2} / K^2.

Since

\displaystyle  \sum_{(a,b) \in A \times B: 1_A * 1_B(ab) \leq |A|^{1/2} |B|^{1/2}/2K^2} 1_A * 1_B(ab) \leq |A|^{3/2} |B|^{3/2} / 2K^2,

we conclude that

\displaystyle  \sum_{(a,b) \in A \times B: 1_A * 1_B(ab) > |A|^{1/2} |B|^{1/2}/2K^2} 1_A * 1_B(ab) \geq |A|^{3/2} |B|^{3/2} / 2K^2.

Since, by Cauchy-Schwarz (or Young’s inequality), we have {1_A*1_B(ab) \leq |A|^{1/2} |B|^{1/2}}, we conclude that there is a set {E \subset A \times B} with {|E| \geq |A| |B|/2K^2} such that

\displaystyle  1_A * 1_B(ab) > |A|^{1/2} |B|^{1/2}/2K^2

for all {(a,b) \in E}.

By slight abuse of notation (arising from the fact that {A}, {B} are not necessarily disjoint, and that {E} is a set of ordered pairs rather than unordered pairs), we can view the triplet {(A,B,E)} as a bipartite graph. Applying Lemma 4, we can find subsets {A', B'} of {A, B} respectively with {|A'| \gg K^{-O(1)} |A|} and {|B'| \gg K^{-O(1)} |B|} such that for all {a \in A'} and {b \in B'}, one can find {\gg K^{-O(1)} |A| |B|} elements {a' \in A, b' \in B} such that {(a,b'), (a',b'), (a',b) \in E}. In particular, we see that

\displaystyle  \sum_{a' \in G} \sum_{b' \in G} 1_A * 1_B(ab') 1_A * 1_B(a'b') 1_A*1_B(a'b) \gg K^{-O(1)} |A|^{5/2} |B|^{5/2}. \ \ \ \ \ (3)

Observe that {1_A * 1_B(a'b') = 1_{B^{-1}}*1_{A^{-1}} ((b')^{-1} (a')^{-1} )}. Using the identity {(ab') ((b')^{-1} (a')^{-1}) (a' b) = ab}, we note that triples {(ab', (b')^{-1} (a')^{-1}, a'b)} for {a',b' \in G} are precisely those triples {(g_1,g_2,g_3) \in G \times G} with {g_1g_2g_3 = ab}. Thus the left-hand side of (3) is equal to {F(ab)}, where

\displaystyle  F := 1_A * 1_B * 1_{B^{-1}} * 1_{A^{-1}} * 1_A * 1_B.

But since

\displaystyle  \|F\|_{\ell^1} = |A| |B| |B^{-1}| |A^{-1}| |A| |B| = |A|^3 |B|^3,

we see from Markov’s inequality that there are at most {O(K^{O(1)} |A|^{1/2} |B|^{1/2})} possible values for {ab}, which gives the bound {|A' \cdot B'| \ll K^{O(1)} |A|^{1/2} |B|^{1/2}}.

The second bound {|A' \cdot (A')^{-1}| \ll K^{O(1)} |A|} can be proven similarly to the first (noting that any {a,a' \in A'} are connected by {\gg K^{-O(1)} |A|^2 |B|^2} paths of length six), but can also from the former bound as follows. Observe that any element {a (a')^{-1} \in A' \cdot (A')^{-1}} has at least {|B'|} representations of the form {a(a')^{-1} = (ab) (a'b)^{-1}} with {b \in B'}, and hence {ab,a'b \in A' \cdot B'}, thus

\displaystyle  1_{A'B'} * 1_{(A'B')^{-1}} \geq |B'| \gg K^{-O(1)} |B|

on {A' (A')^{-1}}. On the other hand, the left-hand side has an {\ell^1(G)} norm of {|A'B'| |(A'B')^{-1}| \ll K^{O(1)} |A| |B|}, and the bound {|A' \cdot (A')^{-1}| \ll K^{O(1)} |A|} then follows from Markov’s inequality. \Box

Exercise 3 In the converse direction, show that if {A, B} are non-empty finite subsets of {G} with {|AB| \leq K |A|^{1/2} |B|^{1/2}}, then {\|1_A * 1_B \|_{\ell^2(G)} \geq |A|^{3/2} |B|^{3/2} / K^{1/2}}.

Exercise 4 If {A, B, C} are three non-empty finite subsets of {G}, establish the Ruzsa triangle inequality {|A \cdot C^{-1}| \leq \frac{|A \cdot B^{-1}| |B \cdot C^{-1}|}{|B|}}. (Hint: mimic the final part of the proof of Corollary 5.)

We now give a variant of this corollary involving approximate groups.

Lemma 6 (Balog-Szemerédi lemma, approximate group form) Let {A} be a finite symmetric subset of a group {G = (G,\cdot)}, and suppose that

\displaystyle  \|1_A * 1_A \|_{\ell^2(G)} \geq |A|^3/K

for some {K \geq 1}. Then there exists a {K^{O(1)}}-approximate group {H} with {|H| \ll K^{O(1)} |A|} such that {|A \cap gH| \gg K^{-O(1)} |A|} for some {g \in H}.

Proof: By Corollary 5, we may find a subset {A' \subset A} with {|A'| \gg K^{-O(1)} |A|} such that

\displaystyle  |A' (A')^{-1}| \ll K^{O(1)} |A|. \ \ \ \ \ (4)

By Exercise 3, this implies that

\displaystyle  \| 1_{A'} * 1_{(A')^{-1}} \|_{\ell^2(G)}^2 \gg K^{-O(1)} |A|^3.

Observe that the left-hand side is equal to

\displaystyle  1_{A'} * 1_{(A')^{-1}} * 1_{A'} * 1_{(A')^{-1}} (1)

\displaystyle  = 1_{(A')^{-1}} * 1_{A'} * 1_{(A')^{-1}} * 1_{A'}(1)

\displaystyle  = \| 1_{(A')^{-1}} * 1_{A'} \|_{\ell^2(G)}^2.

We conclude that

\displaystyle  \sum_{s \in G} (1_{(A')^{-1}} * 1_{A'}(s))^2 \gg K^{-O(1)} |A|^3.

On the other hand, we have

\displaystyle  \sum_{s \in G} 1_{(A')^{-1}} * 1_{A'}(s) = |A'| |A'| \leq |A|^2.

As a consequence, we see that if we set

\displaystyle  S := \{ s \in G: 1_{(A')^{-1}} * 1_{A'}(s) \geq C^{-1} K^{-C} |A| \}

for some sufficiently large absolute constant {C}, then

\displaystyle  \sum_{s \in G \backslash S} (1_{(A')^{-1}} * 1_{A'}(s))^2 \leq C^{-1} K^{-C} |A|^3,

and thus (for {C} large enough)

\displaystyle  \sum_{s \in S} (1_{(A')^{-1}} * 1_{A'}(s))^2 \gg K^{-O(1)} |A|^3.

Since {1_{(A')^{-1}} * 1_{A'}(s) \leq|A'| \leq |A|}, we conclude that

\displaystyle  |S| \gg K^{-O(1)} |A|.

Also, {S} is clearly symmetric and contains the origin.

Now let us consider an element {g = a_0 s_1 \ldots s_5 b_6^{-1}} of the product {(A') S^5 (A')^{-1}}. By construction of {S}, we can write each {s_i} as a product {b_i^{-1} a_i} with {a_i,b_i \in A'} in at least {C^{-1} K^{-C} |A|} ways. Doing so for each {i=1,\ldots,5} gives rise to a factorisation

\displaystyle  g = g_1 \ldots g_6

where {g_i := a_{i-1} b_i^{-1} \in A' (A')^{-1}}; as the {g_1,\ldots,g_6} uniquely determine the {a_i,b_i} (for fixed {a_0,s_1,\ldots,s_5,b_6}), we conclude that each element {g} of {(A') S^5 (A')^{-1}} has at least {\gg K^{-O(1)} |A|^5} such factorisations. But by (4), there are at most {O(K^{O(1)}|A|^6)} such tuples {g_1,\ldots,g_6}, and so there are at most {O(K^{O(1)} |A|)} possible values for {g}, thus

\displaystyle  |(A') S^5 (A')^{-1}| \ll K^{O(1)} |A|. \ \ \ \ \ (5)

In particular,

\displaystyle  |S^5| \ll K^{O(1)} |S|.

By the Ruzsa covering lemma (see the exercise below), this implies that {S^4} is covered by {O(K^{O(1)})} left-translates of {S^2}, and so {H := S^2} is a {K^{O(1)}}-approximate group. Finally, from (5) one has

\displaystyle  |A' H| \ll K^{O(1)} |A|

and thus by Exercise 3

\displaystyle  \| 1_{A'} * 1_H \|_{\ell^2(G)} \gg K^{-O(1)} |A|^{3/2}.

In particular, since the support of {1_{A'} * 1_H} has size {O(K^{O(1)} |A|)}, one has

\displaystyle  1_{A'} * 1_H(g) \gg K^{-O(1)} |A|

for some {g \in G}, or equivalently that

\displaystyle  |A' \cap Hg| \gg K^{-O(1)} |A|.

Increasing {A'} to {A} and taking inverses, we conclude that {|gH \cap A| \ll K^{-O(1)} |A|}, and the claim follows. \Box

Exercise 5 (Ruzsa covering lemma) Let {A, B} be finite non-empty subsets of a group {G}. Show that {A} can be covered by at most {\frac{|AB|}{|B|}} left-translates of {BB^{-1}}. (Hint: consider a maximal disjoint collection of translates {aB} of {B} with {a \in A}.)

Exercise 6 (Converse to Balog-Szemerédi-Gowers) Let {A} be a finite symmetric subset of a group {G = (G,\cdot)}, and suppose there exists a {K}-approximate group {H} with {|H| \leq K |A|} such that {|A \cap gH| \geq |A|/K} for some {g \in H}. Show that

\displaystyle  \|1_A * 1_A\|_{\ell^2(G)} \geq K^{-3} |A|^{3/2}.

Exercise 7 Let {A, B} be finite non-empty subsets of a group {G}, and suppose that {\|1_A * 1_B \|_{\ell^2(G)} \geq |A|^{3/2} |B|^{3/2} / K}. Show that there exists a {O(K^{O(1)})}-approximate group {H} with {|H| \leq K^{O(1)} |A|^{1/2} |B|^{1/2}} and elements {g, h \in G} such that {|A \cap gH| \gg K^{-O(1)} |H|} and {|B \cap Hh| \gg K^{-O(1)} |H|}.

Finally, we can prove Lemma 1. Fix {G, \nu, K}. We may assume that

\displaystyle  \| \nu * \nu \|_{\ell^2(G)} > \frac{1}{K} \|\nu\|_{\ell^2(G)} \ \ \ \ \ (6)

and we need to use this to locate an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

Let us write {M := 1/\|\nu\|_{\ell^2(G)}^2}. Intuitively, {M} represents the “width” of the probability meaure {\nu}, as can be seen by considering the model example {\nu = \frac{1}{M} 1_A} where {A} is a symmetric set of cardinality {M} (i.e. {\nu} is the uniform probability measure on {A}). If we were actually in this model case, we could apply Lemma 6 immediately and be done. Of course, in general, {\nu} need not be a uniform measure on a set of size {M}. However, it turns out that one can use (6) to conclude that the “bulk” of {\nu} is basically of this form.

More precisely, let us split {\nu = \nu_{<}+\nu_{>}+\nu_=}, where

\displaystyle  \nu_{<} := \nu 1_{\nu \leq \frac{1}{100K^2M}}

\displaystyle  \nu_{>} := \nu 1_{\nu \geq \frac{10K}{M}}

\displaystyle  \nu_= := \nu - \nu_{<} \nu_{>}.

Observe that

\displaystyle  \| \nu_{<}\|_{\ell^2(G)}^2 \leq \frac{1}{100K^2M} \| \nu \|_{\ell^1(G)} = \frac{1}{100K^2M}

and so by Young’s inequality

\displaystyle  \| \nu_{<} * \nu \|_{\ell^2(G)} = \| \nu * \nu_{>} \|_{\ell^2(G)} \leq \frac{1}{10KM^{1/2}}.

In a similar vein, we have

\displaystyle  \| \nu_{>} \|_{\ell^1(G)} \leq \frac{M}{10K} \| \nu \|_{\ell^2(G)}^2 = \frac{1}{10K}

and thus by Young’s inequality (and the normalisation {\|\nu\|_{\ell^2(G)} = 1/M^{1/2}})

\displaystyle  \| \nu_{>} * \nu \|_{\ell^2(G)} = \| \nu * \nu_{<} \|_{\ell^2(G)} \leq \frac{1}{10KM^{1/2}}.

Finally, from (6) one has

\displaystyle  \| \nu * \nu \|_{\ell^2(G)} \geq \frac{1}{K M^{1/2}}.

Subtracting using the triangle inequality (ignoring some slight double-counting), we conclude that

\displaystyle  \| \nu_= * \nu_= \|_{\ell^2(G)} \gg \frac{1}{KM^{1/2}}.

If we then set {A := \{ g \in G: \nu(g) > \frac{1}{100K^2 M} \}}, we conclude in particular that

\displaystyle  \| 1_A * 1_A \|_{\ell^2(G)} \gg K^{-O(1)} M^{3/2}.

On the other hand, from Markov’s inequality one has {|A| \ll K^2 M}. Applying Lemma 6, we conclude the existence of a {O(K^{O(1)})}-approximate group {H} with {|H| \ll K^{O(1)} M} such that {|A \cap gH| \gg K^{-O(1)} M} for some {g \in G}, which by definition of {A} implies that {\nu(gH) \gg K^{-O(1)}}, and the claim follows.

— 2. The Bourgain-Gamburd expansion machine —

We can now prove Theorem 2. We can assume that {|G|} is sufficiently large depending on the parameters {k,\kappa,\Lambda,\delta'}, since the claim is trivial for bounded {G} (note that as {S} generates {G}, the Cayley graph {Cay(G,S)} will be an {\epsilon}-expander for some {\epsilon>0}). Henceforth we allow all implied constants in the asymptotic notation to depend on {k,\kappa,\Lambda,\delta'}.

To show expansion, it suffices from the quasirandomness hypothesis (and Proposition 4 from the preceding notes), it will suffice to show that

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \leq |G|^{-1/2+\kappa/2} \ \ \ \ \ (7)

for some {n = O(\log |G|)}.

From Young’s inequality, {\| \mu^{(n)}\|_{\ell^2(G)}} is decreasing in {n}, and is initially equal to {1} when {n=0}. We need to “flatten” the {\ell^2(G)} norm of {\mu^{(n)}} as {n} increases. We first use the non-concentration hypothesis to obtain an initial amount of flattening:

Proposition 7 For any {n \geq \frac{1}{2} \Lambda \log |G|}, one has

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \leq |G|^{-\kappa/4}. \ \ \ \ \ (8)

Furthermore, we have

\displaystyle  \mu^{(n)}(gH) \leq |G|^{-\kappa/2} \ \ \ \ \ (9)

for all proper subgroups {H} of {G} and all {g \in G}.

Proof: By the non-concentration hypothesis, we can find {n_0 \leq \frac{1}{2} \Lambda \log |G|} such that

\displaystyle  \mu^{(2n_0)}(H) \leq |G|^{-\kappa}

for all proper subgroups {H} of {G}. If we write {\mu^{(2n_0)}(H)} as {\mu^{(n_0)}*\mu^{(n_0)}( Hg g^{-1} H)}, we see that

\displaystyle  \mu^{(2n_0)}(H) \geq \mu^{(n_0)}(Hg) \mu^{(n_0)}(g^{-1} H)

for all {g \in G}. By symmetry, {\mu^{(n_0)}(g^{-1} H) = \mu^{(n_0)}(Hg)}, and thus

\displaystyle  \sup_{g \in G} \mu^{(n_0)}(gH) \leq |G|^{-\kappa/2}.

If {n \geq \frac{1}{2} \Lambda \log |G|}, then we may write {\mu^{(n)}} as the convolution of a probability measure {\mu^{(n-n_0)}} and {\mu^{(n_0)}}. From this, we see that

\displaystyle  \mu^{(n)}(g' H) \leq \sup_{g \in G} \mu^{(n_0)}(gH) \leq |G|^{-\kappa/2}

for all {g' \in G}, giving the claim (9). Specialising this to the case when {H} is the trivial group, one has

\displaystyle  \| \mu^{(n)} \|_{\ell^\infty(G)} \leq |G|^{-\kappa/2}.

Since we also have

\displaystyle  \| \mu^{(n)} \|_{\ell^1(G)} = 1,

the claim (8) then follows from Hölder’s inequality. \Box

Now we obtain additional flattening using the product theorem hypothesis:

Lemma 8 (Flattening lemma) Suppose {n \geq \frac{1}{2} \Lambda \log |G|} is such that

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \geq |G|^{-1/2+\kappa/2}. \ \ \ \ \ (10)

Then one has

\displaystyle  \| \mu^{(n)} * \mu^{(n)} \|_{\ell^2(G)} \leq |G|^{-\epsilon} \| \mu^{(n)} \|_{\ell^2(G)}

for some {\epsilon>0} depending only on {\kappa} and {\delta'}.

Proof: Suppose the claim fails for some {\epsilon} to be chosen later, thus

\displaystyle  \| \mu^{(n)} * \mu^{(n)} \|_{\ell^2(G)} > |G|^{-\epsilon} \| \mu^{(n)} \|_{\ell^2(G)}.

Applying Lemma 1, we may thus find a {O(|G|^{O(\epsilon)})}-approximate group {H} with

\displaystyle  |H| \ll |G|^{O(\epsilon)} / \| \mu^{(n)} \|_{\ell^2(G)}^2

and {g \in G} such that

\displaystyle  \mu^{(n)}(gH) \gg |G|^{-O(\epsilon)}.

Since {\mu^{(n)}\|_{\ell^\infty(G)} \leq |G|^{-\kappa/2}} by (9), we see that

\displaystyle  |H| \gg |G|^{\kappa/2-O(\epsilon)}.

Meanwhile, from (10) one has

\displaystyle  |H| \ll |G|^{1-\kappa + O(\epsilon)}.

Applying the product hypothesis (assuming {\epsilon} sufficiently small depending on {\kappa} and {\delta}), we conclude that {H} generates a proper subgroup {K} of {G}, and thus

\displaystyle  \mu^{(n)}(gK) \gg |G|^{-O(\epsilon)}.

But this contradicts (9) (again if {\epsilon} is sufficiently small). \Box

Iterating the above lemma {O(1)} times we obtain (7) for some {n = O(\log |G|)}, as desired.

Remark 4 Roughly speaking, the three hypotheses in Theorem 2 govern three separate stages of the life cycle of the random walk and its distributions {\mu^{(n)}}. In the early stage {n = o(\log |G|)}, the non-concentration hypotheses creates some initial spreading of this random walk, in particular ensuring that the walk “escapes” from cosets of proper subgroups. In the middle stage {n \sim \log |G|}, the product theorem steadily flattens the distribution of the random walk, until it is very roughly comparable to the uniform distribution. Finally, in the late stage {n \gg \log |G|}, the quasirandomness property can smooth out the random walk almost completely to obtain the mixing necessary for expansion.


Filed under: 254B - expansion in groups, math.CO Tagged: additive combinatorics, Balog-Szemeredi-Gowers lemma, expander graphs, graph theory

Terence Tao254B, Notes 5: Product theorems, pivot arguments, and the Larsen-Pink non-concentration inequality

In the previous set of notes, we saw that one could derive expansion of Cayley graphs from three ingredients: non-concentration, product theorems, and quasirandomness. Quasirandomness was discussed in Notes 3. In the current set of notes, we discuss product theorems. Roughly speaking, these theorems assert that in certain circumstances, a finite subset {A} of a group {G} either exhibits expansion (in the sense that {A^3}, say, is significantly larger than {A}), or is somehow “close to” or “trapped” by a genuine group.

Theorem 1 (Product theorem in {SL_d(k)}) Let {d \geq 2}, let {k} be a finite field, and let {A} be a finite subset of {G := SL_d(k)}. Let {\epsilon >0} be sufficiently small depending on {d}. Then at least one of the following statements holds:

  • (Expansion) One has {|A^3| \geq |A|^{1+\epsilon}}.
  • (Close to {G}) One has {|A| \geq |G|^{1-O_d(\epsilon)}}.
  • (Trapping) {A} is contained in a proper subgroup of {G}.

We will prove this theorem (which was proven first in the {d=2,3} cases for fields {F} of prime order by Helfgott, and then for {d=2} and general {F} by Dinai, and finally to general {d} and {F} independently by Pyber-Szabo and by Breuillard-Green-Tao) later in this notes. A more qualitative version of this proposition was also previously obtained by Hrushovski. There are also generalisations of the product theorem of importance to number theory, in which the field {k} is replaced by a cyclic ring {{\bf Z}/q{\bf Z}} (with {q} not necessarily prime); this was achieved first for {d=2} and {q} square-free by Bourgain, Gamburd, and Sarnak, by Varju for general {d} and {q} square-free, and finally by this paper of Bourgain and Varju for arbitrary {d} and {q}.

Exercise 1 (Girth bound) Assuming Theorem 1, show that whenever {S} is a symmetric set of generators of {SL_d(k)} for some finite field {k} and some {d\geq 2}, then any element of {SL_d(k)} can be expressed as the product of {O_d( \log^{O_d(1)} |k| )} elements from {S}. (Equivalently, if we add the identity element to {S}, then {S^m = SL_d(k)} for some {m = O_d( \log^{O_d(1)} |k| )}.) This is a special case of a conjecture of Babai and Seress, who conjectured that the bound should hold uniformly for all finite simple groups (in particular, the implied constants here should not actually depend on {d}. The methods used to handle the {SL_d} case can handle other finite groups of Lie type of bounded rank, but at present we do not have bounds that are independent of the rank. On the other hand, a recent paper of Helfgott and Seress has almost resolved the conjecture for the permutation groups {A_n}.

A key tool to establish product theorems is an argument which is sometimes referred to as the pivot argument. To illustrate this argument, let us first discuss a much simpler (and older) theorem, which has a much weaker conclusion but is valid in any group {G}:

Theorem 2 (Baby product theorem) Let {G} be a group, and let {A} be a finite non-empty subset of {G}. Then one of the following statements hold:

  • (Expansion) One has {|A^{-1} A| \geq \frac{3}{2} |A|}.
  • (Close to a subgroup) {A} is contained in a left-coset of a group {H} with {|H| < \frac{3}{2} |A|}.

To prove this theorem, we suppose that the first conclusion does not hold, thus {|A^{-1} A| <\frac{3}{2} |A|}. Our task is then to place {A} inside the left-coset of a fairly small group {H}.

To do this, we take a group element {g \in G}, and consider the intersection {A\cap gA}. A priori, the size of this set could range from anywhere from {0} to {|A|}. However, we can use the hypothesis {|A^{-1} A| < \frac{3}{2} |A|} to obtain an important dichotomy, reminiscent of the classical fact that two cosets {gH, hH} of a subgroup {H} of {G} are either identical or disjoint:

Proposition 3 (Dichotomy) If {g \in G}, then exactly one of the following occurs:

  • (Non-involved case) {A \cap gA} is empty.
  • (Involved case) {|A \cap gA| > \frac{|A|}{2}}.

Proof: Suppose we are not in the pivot case, so that {A \cap gA} is non-empty. Let {a} be an element of {A \cap gA}, then {a} and {g^{-1} a} both lie in {A}. The sets {A^{-1} a} and {A^{-1} g^{-1} a} then both lie in {A^{-1} A}. As these sets have cardinality {|A|} and lie in {A^{-1}A}, which has cardinality less than {\frac{3}{2}|A|}, we conclude from the inclusion-exclusion formula that

\displaystyle |A^{-1} a \cap A^{-1} g^{-1} a| > \frac{|A|}{2}.

But the left-hand side is equal to {|A \cap gA|}, and the claim follows. \Box

The above proposition provides a clear separation between two types of elements {g \in G}: the “non-involved” elements, which have nothing to do with {A} (in the sense that {A \cap gA = \emptyset}, and the “involved” elements, which have a lot to do with {A} (in the sense that {|A \cap gA| > |A|/2}. The key point is that there is a significant “gap” between the non-involved and involved elements; there are no elements that are only “slightly involved”, in that {A} and {gA} intersect a little but not a lot. It is this gap that will allow us to upgrade approximate structure to exact structure. Namely,

Proposition 4 The set {H} of involved elements is a finite group, and is equal to {A A^{-1}}.

Proof: It is clear that the identity element {1} is involved, and that if {g} is involved then so is {g^{-1}} (since {A \cap g^{-1} A = g^{-1}(A \cap gA)}. Now suppose that {g, h} are both involved. Then {A \cap gA} and {A\cap hA} have cardinality greater than {|A|/2} and are both subsets of {A}, and so have non-empty intersection. In particular, {gA \cap hA} is non-empty, and so {A \cap g^{-1} hA} is non-empty. By Proposition 3, this makes {g^{-1} h} involved. It is then clear that {H} is a group.

If {g \in A A^{-1}}, then {A \cap gA} is non-empty, and so from Proposition 3 {g} is involved. Conversely, if {g} is involved, then {g \in A A^{-1}}. Thus we have {H = A A^{-1}} as claimed. In particular, {H} is finite. \Box

Now we can quickly wrap up the proof of Theorem 2. By construction, {A \cap gA| > |A|/2} for all {g \in H},which by double counting shows that {|H| < 2|A|}. As {H = A A^{-1}}, we see that {A} is contained in a right coset {Hg} of {H}; setting {H' := g^{-1} H g}, we conclude that {A} is contained in a left coset {gH'} of {H'}. {H'} is a conjugate of {H}, and so {|H'| < 2|A|}. If {h \in H'}, then {A} and {Ah} both lie in {H'} and have cardinality {|A|}, so must overlap; and so {h \in A A^{-1}}. Thus {A A^{-1} = H'}, and so {|H'| < \frac{3}{2} |A|}, and Theorem 2 follows.

Exercise 2 Show that the constant {3/2} in Theorem 2 cannot be replaced by any larger constant.

Exercise 3 Let {A \subset G} be a finite non-empty set such that {|A^2| < 2|A|}. Show that {AA^{-1}=A^{-1} A}. (Hint: If {ab^{-1} \in A A^{-1}}, show that {ab^{-1} = c^{-1} d} for some {c,d \in A}.)

Exercise 4 Let {A \subset G} be a finite non-empty set such that {|A^2| < \frac{3}{2} |A|}. Show that there is a finite group {H} with {|H| < \frac{3}{2} |A|} and a group element {g \in G} such that {A \subset Hg \cap gH} and {H = A A^{-1}}.

Below the fold, we give further examples of the pivot argument in other group-like situations, including Theorem 2 and also the “sum-product theorem” of Bourgain-Katz-Tao and Bourgain-Glibichuk-Konyagin.

— 1. The sum-product theorem —

Consider a finite non-empty subset {A} of a field {k}. Then we may form the sumset

\displaystyle  A+A := \{a+b: a,b \in A \}

and the product set

\displaystyle  A \cdot A := \{ab: ab \in A \}.

The minimal sizes of such sets are well understood:

Exercise 5 Let {A} be a finite non-empty subset of a field {k}.

  • (i) Show that {|A+A| \geq |A|}, with equality occuring if and only if {A} is an additive coset {A = x+H} of an finite additive subgroup {H} of {k} with some {x \in k}.
  • (ii) Show that {|A\cdot A|\geq |A|}, with equality occuring if and only if {A} is either equal to a multiplicative coset {A = gH} of a finite multiplicative subgroup {H} of {k^\times := k \backslash \{0\}} with some {g \in k^\times}, or the set {\{0\}}, or the set {\{0\} \cup gH} where {gH} is a multiplicative coset.
  • (iii) Show that {\max(|A+A|, |A\cdot A|\geq |A|}, with equality occuring if and only if {A} is either equal to a multiplicative dilate {A = cF} of a finite subfield {F} of {k} with {c \in k^\times}, a singleton set, or an additive subgroup of order {2}.

The sum-product phenomenon is a robust version of the above observation, asserting that one of {A+A} or {A\cdot A} must be significantly larger than {A} if {A} is not somehow “close” to a genuine subfield of {k}. Here is one formulation of this phenomenon:

Theorem 5 (Sum-product theorem) Let {\epsilon>0} be a sufficiently small number. Then for any field {k} and any finite non-empty subset {A}, one of the following statements hold:

  • (Expansion) {\max(|A+A|, |A\cdot A|) \geq |A|^{1+\epsilon}}.
  • (Close to a subfield) There is a dilate {cF} of a subfield {F} of {k} with {|F| \ll |A|^{1+O(\epsilon)}} and {c\neq 0} which contains all but {O(|A|^{O(\epsilon)})} elements of {A}.
  • (Smallness) {A} is an additive subgroup of order {2}.

If {k} has characteristic zero, then the second option here cannot occur, and we conclude that {\max(|A+A|,|A \cdot A|) \geq |A|^{1+\epsilon}} for some absolute constant {\epsilon>0} as soon as {A} contains at least two non-zero elements, a claim first established in {{\bf R}} by Erdos and Szemeredi. When {k} is a finite field of prime order, the second option can only occur when {F=k}, and we conclude that {\max(|A+A|,|A \cdot A|) \geq |A|^{1+\epsilon}} as soon as {|A| \leq |k|^{1-C\epsilon}} whenever {\epsilon} is sufficiently small, {C} is an absolute constant, and {A} has at least two non-zero elements. A preliminary version of this result (which required more size assumptions on {A}, in particular a bound of the shape {|A| \geq |k|^\delta}) was obtained by Bourgain, Katz, and Tao, with the version stated above first obtained by Bourgain, Glibichuk, and Konyagin. The proof given here is drawn from my book with Van, and was originally inspired by this paper of Bourgain and Konyagin.

Remark 1 There has been a substantial amount of literature on trying to optimise the exponent {\epsilon} in the sum-product theorem. A relatively recent survey of this literature can be found in this paper of mine (and in the references to the other papers cited in this remark). In {{\bf R}}, the best result currently in this direction is by Solymosi, who established that one can take {\epsilon} arbitrarily close to {1/3}; for {{\bf C}}, the best result currently is by Rudnev, who shows that one can take {\epsilon} arbitrarily close to {19/69}. For fields of prime order, one can take {\epsilon} arbitrarily close to {1/11}, a result of Rudnev; an extension to arbitrary finite fields was then obtained by Li and Roche-Newton.

We now start proving Theorem 5. As with Theorem 2, the engine of the proof is a dichotomy similar to that of Proposition 3. Whilst the former proposition was modeled on the basic group-theoretic assertion that cosets {gH} of a subgroup where either identical or disjoint, this proposition is modeled on the basic linear algebra fact that if {F} is a subfield of {k} and {\xi \in k}, then {F+\xi F} is either of size {|F|^2}, or of size {|F|}.

Lemma 6 (Dichotomy) Let {k} be a field, let {A} be a finite non-empty subset of {k}, and let {\xi \in k}. Then at least one of the following statements hold:

  • (Non-involved case) {|A + \xi A| = |A|^2}.
  • (Involved case) {|A + \xi A| \leq |(A-A) A +(A-A) A|}.

Proof: Suppose that we are not in the non-involved case, thus {|A + \xi A| \neq |A|^2}. Then the map {(a,b) \mapsto a+\xi b} from {A\times A} to {k} is not injective, and so there exists {a,b,c,d \in A} with {(a,b) \neq (c,d)} and

\displaystyle  a+\xi b = c+\xi d.

In particular, {b \neq d}. We then have {\xi = (a-c)/(d-b)} and so

\displaystyle  |A + \xi A| = |(d-b) A + (a-c) A| \leq |(A-A) A +(A-A) A|.

\Box

Remark 2 One can view {A+\xi A} as measuring the extent to which the dilate {\xi A} of {A} is “transverse” to {A}. As the “slope” {\xi} varies, {\xi A} “pivots” around the origin, encountering both the (relatively rare) involved slopes, and the (generic) non-involved slopes. It is this geometric picture which led to the term “pivot argument”, as used in particular by Helfgott (who labeled the non-involved slopes as “pivots”).

This dichotomy becomes useful if there is a significant gap between {|(A-A) A +(A-A) A|} and {|A|^2}. Let’s see how. To prove Theorem 5, we may assume that {|A|} is larger than some large absolute constant {C}, as the claim follows from Exercise 5 otherwise (making {\epsilon} small enough depending on {C}). by deleting {0} from {A}, and tweaking {\epsilon}, noting that we may then assume that {A} does not contain {0}. We suppose that

\displaystyle  |A+A|, |A\cdot A|\leq K|A|

for some {K \leq |A|^{\epsilon_0}} and some sufficiently small absolute constant {\epsilon_0}. In particular we see that {|A|} will exceed any quantity of the form {O(K^{O(1)})} if we make {\epsilon_0} small enough and {C} large enough.

We would like to boost this control of sums and products to more complex combinations of {A}. We will need some basic tools from additive combinatorics.

Lemma 7 (Ruzsa triangle inequality) If {A,B,C} are non-empty finite subsets of {k}, then {|A-C| \leq \frac{|A-B||B-C|}{|B|}}.

Proof: This is the additive version of Exercise 4 from Notes 4. \Box

Lemma 8 (Ruzsa covering lemma) If {A, B} are non-empty finite subsets of {k}, then {A} can be covered by at most {\frac{|A+B|}{|B|}} translates of {B-B}.

Proof: This is the additive version of Exercise 5 from Notes 4. \Box

Exercise 6 (Sum set estimates) If {A, B} are non-empty finite subsets of {k} such that {|A+B|\leq K |A|^{1/2}|B|^{1/2}}, show that {A} and {B} can both be covered by {O(K^{O(1)})} translates of the same {O(K^{O(1)})}-approximate group {H}, with {|H| \ll K^{O(1)} |A|}. Conclude that

\displaystyle  |n_1 A - n_2 A + n_3 B - n_4 B| \ll_{n_1,n_2,n_3,n_4} K^{O(|n_1|+|n_2|+|n_3|+|n_4|)} |A|

for any natural numbers {n_1,n_2,n_3,n_4}, where {nA := A+\ldots+A} denotes the sum set of {n} copies of {A}. (Hint: use the additive form of Exercise 7 from Notes 4, and the preceding lemmas.)

These lemmas allow us to improve the sum-product properties of {A} by passing to a large subset {B} (cf. the Balog-Szemeredi-Gowers lemma from Notes 4):

Lemma 9 (Katz-Tao lemma) Let {A} be as above. Then there is a subset {B} of {A} with {|B| \geq |A|/2K} such that {|B^2-B^2| \ll K^{O(1)} |B|}.

Proof: The dilates {aA} of {A} with {a \in A} all lie in a set {A^2} of cardinality at most {K|A|}. Intuitively, this should force a lot of collision between the {aA}, which we will exploit using the sum set estimates. More precisely, observe that

\displaystyle  \| \sum_{a \in A} 1_{aA}\|_{\ell^1}= |A|^2

and hence by Cauchy-Schwarz

\displaystyle  \| \sum_{a \in A} 1_{aA}\|_{\ell^2}^2 \geq |A|^3/K.

The left-hand side can be written as

\displaystyle  \sum_{b \in A} \sum_{a\in A}|aA\cap bA|

and so by the pigeonhole principle we can find {b_0 \in A} such that

\displaystyle  \sum_{a\in A}|aA\cap b_0 A| \geq |A|^2/K.

We apply a dilation to set {b_0=1} (recall that {A} does not contain {0}). If we set {B := \{a \in A: |aA\cap A| \geq |A|/2K\}}, we conclude that

\displaystyle  \sum_{a \in B}|aA\cap A| \geq |A|^2/2K

which implies in particular that

\displaystyle  |B|\geq |A|/2K

If {a \in B}, then

\displaystyle  |aA \cap A| \geq |A|/2K;

since {|aA + aA| \leq K|A|} we also have

\displaystyle  |aA + (aA \cap A)| \leq K|A|

and similarly

\displaystyle  |A + (aA \cap A)| \leq K|A|

and thus by the Ruzsa triangle inequality

\displaystyle  |aA -A| \ll K^{O(1)} |A|

whenever {a \in B}. Informally, let us call a non-zero element {a} of {k} good if {|aA - A| \ll K^{O(1)} |A|} (but note that this notion of “good” is a bit fuzzy, as it depends on the choice of implied constants in the {O()} notation). Observe that if {a,a'} are good, then

\displaystyle  |aa' A - a A|, |a A - A| \ll K^{O(1)} |A|

and thus by the Ruzsa triangle inequality

\displaystyle  |aa' A - A| \ll K^{O(1)} |A|,

thus the product of two good elements are good (with somewhat worse implied constants). Similarly, from the Ruzsa covering lemma we see that {aA} and {a' A} are both covered by {O(K^{O(1)})} translates of {A-A}, and from this and sum set estimates we see that

\displaystyle  |(a+a')A-A| \ll K^{O(1)} |A|

and so the sum of two good elements is again good. Similarly the difference of good elements is good. Applying all these facts, we conclude that all the elements of {B^2-B^2} are good, thus {|gA-A| \ll K^{O(1)} |A|} for all {g \in B^2-B^2}. In particular, since {|A|} exceeds {K^{O(1)}}, we see from the Cauchy-Schwarz inequality that for each {g \in B^2-B^2}, there are {\gg |A|^3/K^{O(1)}} solutions to the equation {ga_1-a_2=ga_3-a_4} with {a_1 \neq a_3} and {a_1,a_2,a_3,a_4 \in A}. However, there are only {|A|^4} possible choices for {a_1,a_2,a_3,a_4}, and each such choice uniquely determines {g}, so there are at most {O(K^{O(1)} |A|)} possible choices for {g}, and the claim follows. \Box

Note that one could replace {B^2-B^2} in the above lemma by any other homogeneous polynomial combination of {B}.

By applying a dilation, we may assume that {B} contains {1}. Applying Proposition 6 to this set {B} (and using sum set estimates), we arrive at the following dichotomy: every field element {\xi \in k} is either “non-involved” in the sense that {|B+\xi B| = |B|^2}, or is “involved” in the sense that {|B+\xi B| \leq C_1 K^{C_1} |B|} for some fixed absolute constant {C_1}. By sum set estimates we have {|B+BB| \ll K^{O(1)} |B|}; as we can assume that {|A|}, and hence {|B|}, is larger than any quantity of the form {O(K^{O(1)})}, this forces all elements of {B} to be involved.

To exploit this, observe (by repeating the proof of Proposition 6) that if {\xi_1,\xi_2} are involved, then the quantities {\xi = \xi_1\xi_2, \xi_1+\xi_2, \xi_1-\xi_2} are somewhat involved in the sense that

\displaystyle  |B + \xi B|\ll K^{O(1)} |B|

for those choices of {\xi} (where the implied constants depend on {C_1}). But as we can assume that {|A|}, and hence {|B|}, is larger than any quantity of the form {O(K^{O(1)})}, we see from Proposition 6 that this forces {\xi} to be involved as well (this is the crucial step at which approximate structure is improved to exact structure). We thussee that the set {F} of all involved elements is closed under multiplication, addition, and subtraction; as it also contains {0}, it is a subring of {k}. Arguing as in the proof of Proposition 6, we have that {F} is finite with {|F|\ll K^{O(1)} |A|}; in particular, {F} must now be a finite subfield of {k}.

Now we enter the “endgame”, in which we use this {F} to control {A}. By previous discussion, {F} contains {B}, and thus {|A \cap F| \gg K^{-O(1)} |A|}. By the Ruzsa triangle inequality applied to {A, A \cap F, F}, this implies that {|A+F| \ll K^{O(1)} |F|}, and so {A} can be covered by {O(K^{O(1)})} translates of {F}. A similar argument applied multiplicatively shows that {A} can be covered by {O(K^{O(1)})} dilates of {F}. Since a non-trivial translate of {F} and a non-trivial dilate of {F} intersect in at most one point, we conclude that {A} has at most {O(K^{O(1)})} elements outside of {F}, and the claim follows.

Remark 3 One can abstract this argument by replacing the multiplicative structure here by an abelian group action; see this paper of Helfgott for details. The argument can also extend to non-commutative settings, such as division algebras or more generally to arbitrary rings (though in the latter case, the presence of non-trivial zero-divisors becomes a very significant issue); see this paper for details.

— 2. Finite subgroups of {SL_2}

We will shortly establish Theorem 1, which can be viewed as a way to describe approximate subgroups of {SL_d(k)}. Before we do so, let us first warm up and digress slightly by by studying genuine finite subgroups {A} of {SL_d(k)}, in the model case {d=2}, for which ad hoc explicit calculations are available. In order to make the algebraic geometry of the situation cleaner, it is convenient to embed the field {k} in its algebraic closure {\overline{k}}, and similarly embed {SL_2(k)} in {SL_2(\overline{k})}. This is a group which is also an algebraic variety (identifying the space of {2 \times 2} matrices with coefficients in {\overline{k}} with {\overline{k}^4}), whose group operations are algebraic (in fact, polynomial) maps; in other words, {SL_2(\overline{k})} is an algebraic group. We now consider the question of what finite subgroups of {SL_2(\overline{k})} can look like. This is a classical question, with a complete classification obtained by Dickson in 1901. The precise classification is somewhat complicated; to give just a taste of this complexity, we observe that the symmetry group of the isocahedron is a finite subgroup of {SO_3({\bf R})}, which can be lifted to the spin group {Spin_3({\bf R})} (giving what is known as the binary isocahedral group, a group of order {120}), which is a subgroup of {Spin_3({\bf C})}, which can be identified with {SL_2({\bf C})}. Because of this, it is possible for some choices of finite field {k} to embed the binary isocahedral group into {SL_2(\overline{k})} or {SL_2(k)}. Similar considerations obtain for the symmetry group of other Platonic solids. However, if one is willing to settle for a “rough” classification, in which one ignores groups of bounded size (and more generally, is willing just to describe a bounded index subgroup of the group {A}), the situation becomes much simpler. In the characteristic zero case {k={\bf C}}, for instance, we have Jordan’s theorem, which asserts that given a finite subgroup {A} of {SL_d({\bf C})} for some {d=O(1)}, a bounded index subgroup of {A} is abelian. (Jordan’s theorem was discussed further in last quarter’s course.) The finite characteristic case is inherently more complicated though (due in large part to the proliferation of finite subfields), with a satisfactory rough classification only becoming available for general {d} with the work of Larsen and Pink (published in 2011, but which first appeared as a preprint in 1998). However, the {d=2} case is significantly simpler and can be treated by somewhat ad hoc methods, as we shall now do. The discussion here is loosely based on this paper of Kowalski.

We pause to recall some basic structural facts about {SL_2(\overline{k})}. Elements of this group are {2 \times 2} matrices with determinant one, and thus have two (algebraic, possibly repeated) eigenvalues {t, t^{-1}} for some {t\in \overline{k}} (note here that we are using the algebraically closed nature of {\overline{k}}). This allows us to classify elements of {SL_2(\overline{k})} into three classes:

  • The central elements {\pm 1};
  • The regular unipotent elements and their negations, which are non-central elements with a double eigenvalue at {+1} (or a double eigenvalue at {-1}); and
  • The regular semisimple elements, which have two distinct eigenvalues.

We collectively refer to regular unipotent elements and their negations as regular projectively unipotent elements.

Remark 4 The presence of the non-identity central element {-1} leads to some slight technical annoyances (for instance, it means that {SL_2} merely an almost simple algebraic group rather than a simple one, in the sense that the only normal algebraic subgroups are finite). One can eliminate this element by working instead with the projective special linear group {PSL_2:= SL_2/\{\pm 1\}}, but we will not do so here. We remark that if one works in {SL_d} for {d>2} then the classification of elements becomes significantly more complicated, for instance there exist elements which are semisimple (i.e. diagonalisable) but neither regular nor central, because some but not all of the eigenvalues may be repeated.

One can distinguish the unipotent elements from the semisimple ones using the trace: unipotent elements have trace {+2}, their negations have trace {-2}, and the semisimple elements have traces distinct from {\pm 2}. The ability to classify elements purely from the trace is a very special fact concerning {SL_2} which breaks down completely for higher rank matrix groups, but we will not hesitate to take advantage of this fact here.

Associated to the above classification are some natural algebraic subgroups of {SL_2(\overline{k})}, including the standard maximal torus

\displaystyle  T(\overline{k}) := \{ \begin{pmatrix} t & 0 \\ 0 & t^{-1} \end{pmatrix}: t \in \overline{k}^\times \},

the one-dimensional standard unipotent group

\displaystyle  U(\overline{k}) := \{ \begin{pmatrix} 1 & x \\ 0 & 1 \end{pmatrix}: x \in \overline{k} \},

and the two-dimensional standard Borel subgroup

\displaystyle  B(\overline{k}) := \{ \begin{pmatrix} t & x \\ 0 & t^{-1} \end{pmatrix}: x \in \overline{k}; t \in \overline{k}^\times \}.

More generally, we define a maximal torus of {SL_2(\overline{k})} to be a conjugate (in {SL_2(\overline{k})}) of the standard maximal torus, a unipotent group to be a conjugate of the standard unipotent group, and a Borel subgroup to be a conjugate of the standard Borel subgroup. (This is not really the “right” way to define these groups, for the purpose of generalisation to other algebraic groups, but will suffice as long as we are only working with {SL_2}.) Note that one can also think of a Borel subgroup as the stabiliser of a one-dimensional subspace of {\overline{k}^2} (using the obvious action of {SL_2(\overline{k})} on {\overline{k}^2}). Using the Jordan normal form (again taking advantage of the algebraically closed nature of {\overline{k}}), we can see how these groups interact with group elements:

  • The central elements lie in every maximal torus and every Borel subgroup. The identity {+1} lies in every unipotent group, but {-1} lies in none of them.
  • Every regular unipotent element lies in exactly one unipotent group, which in turn lies in exactly one Borel subgroup (the normaliser of the unipotent group). Conversely, a unipotent group consists entirely of regular unipotent elements and the identity {+1}.
  • Every regular semisimple element lies in exactly one maximal torus, which in turn lies in exactly two Borel subgroups (the stabiliser of one of the eigenspaces of a regular semisimple element in the torus). Conversely, a maximal torus consists entirely of regular semisimple elements and the central elements {\pm 1}.

Remark 5 If one was working in a non-algebraically closed field {F} instead of in {\overline{k}}, one could subdivide the regular semisimple elements into two classes, the split case when the elements can be diagonalised inside {F}, and the non-split case when they can only be diagonalised in a quadratic extension of {F}. This similarly subdivides maximal tori into two families, the split tori and the non-split tori. In the case when one is working over the field {{\bf R}}, the unipotent, split semisimple, and non-split semisimple elements are referred to as parabolic, hyperbolic, and elliptic elements of {SL_2({\bf R})} respectively. Fortunately, in our applications we can work in algebraically closed fields and avoid these sorts of finer distinctions.

Ignoring the exceptional small examples of subgroups of {SL_2(\overline{k})}, such as the binary isocahedral group mentioned earlier, there are two obvious ways to generate subgroups of {SL_2(\overline{k})}. One is to pass from {\overline{k}} to a subfield {F}, creating “arithmetic” subgroups of the form {SL_2(F)} (or conjugates thereof). The other is to replace {SL_2} with an algebraic subgroup of the three-dimensional group {SL_2}, such as the maximal tori, unipotent groups, and Borel subgroups mentioned earlier. (Actually, these are the only (connected) proper algebraic subgroups of {SL_2}, as can be seen by consideration of the associated Lie algebras.)

Observe that if {A = SL_2(F)} is an arithmetic subgroup, then its intersections {T(F) := A \cap T(\overline{k})}, {U(F) := A \cap U(\overline{k})}, {B(F) := A \cap B(\overline{k})} capture a portion of {A} proportionate to the dimensions involved, or more precisely that

\displaystyle  |A \cap T(\overline{k})| \ll |A|^{1/3}; |A \cap U(\overline{k})| \ll |A|^{1/3}; \quad |A \cap B(\overline{k})| \ll |A|^{2/3}.

Indeed, it is easy to see that {|A| = |SL_2(F)| \sim |F|^3}, {|A \cap T(\overline{k})| = |T(F)| \sim |F|}, and so forth. An important and general observation of Larsen and Pink is that this sort of behaviour is shared by all other finite subgroups of algebraic groups such as {SL_2(\overline{k})}, as long as these groups are not (mostly) trapped in a proper algebraic subgroup. We first illustrate this phenomenon for the torus groups:

Proposition 10 (Larsen-Pink inequality, special case) Let {A} be a finite subgroup of {SL_2(\overline{k})}. Then one of the following statements hold:

  • (Non-concentration) For any maximal torus {T}, one has {|A \cap T| \ll |A|^{1/3}}.
  • (Trapping) There is a Borel subgroup {B} such that {|A \cap B| \gg |A|}.

Proof: Suppose that the trapping hypothesis fails, thus {|A \cap B| = o(|A|)} for all Borel subgroups {B}, where we interpret {o(|A|)} here to mean “less than {\epsilon |A|} for an arbitrarily small constant {\epsilon>0} which we are at liberty to choose”. (If one is uncomfortable with this type of definition, one can instead consider a sequence of potential counterexamples {A = A_n} to the above proposition in various groups {SL_2(\overline{k_n})}, in which {\sup_B |A \cap B| = o_{n \rightarrow \infty}(|A|)}. Alternatively, one can also rephrase this argument if desired in the language of nonstandard analysis.) In particular, we see that any coset of {B} occupies a fraction {o(1)} at most of {A}. Thus, for instance, if we select an element {\begin{pmatrix} a & b \\ c & d \end{pmatrix}} from {A} uniformly at random, then with probability {1-o(1)}, {b} is non-zero, and similarly for {a,c,d}. To put it more informally, the matrix entries of an element of {A} are “generically” non-zero. Similarly if we first conjugate {A} by a fixed group element.

We need to show that {|A \cap T| \ll |A|^{1/3}} for any maximal torus {T}. By conjugation we may take {T} to be the standard maximal torus {T = T(\overline{k})}. Set {A' := A \cap T(\overline{k})}, then {A'} is a subgroup of {A} of the form

\displaystyle  A' := \{ \begin{pmatrix} t & 0 \ & t^{-1} \end{pmatrix}: t\in H \}

for some finite multiplicative subgroup {H} of {\overline{k}^\times}. We may assume that {|H|} is larger than any given absolute constant, as the claim is trivial otherwise. Our task is to show that {|H|^3 \ll |A|}.

Let {g = \begin{pmatrix} a & b \\ c & d \end{pmatrix}} be a typical element of {A}. By the preceding discussion, we may assume that {a,b,c,d} are all non-zero. Since {A'} is a subgroup of {A}, we have {A' g A' g A' \subset A}, thus

\displaystyle  \begin{pmatrix} t_1 & 0 \ & t_1^{-1} \end{pmatrix} \begin{pmatrix} a & b \\c & d \end{pmatrix} \begin{pmatrix} t_2 & 0 \ & t_2^{-1} \end{pmatrix} \begin{pmatrix} a & b \\c & d \end{pmatrix} \begin{pmatrix} t_3 & 0 \ & t_3^{-1} \end{pmatrix} \in A

for all {t_1,t_2,t_3 \in H}. We evaluate the inner matrix products to obtain that

\displaystyle  \begin{pmatrix} t_1 & 0 \ & t_1^{-1} \end{pmatrix} \begin{pmatrix} a^2 t_2 + bc t_2^{-1} & ac t_2 + bd t_2^{-1} \\ ac t_2 + cd t_2^{-1} & bc t_2 + d^2 t_2^{-1} \end{pmatrix} \begin{pmatrix} t_3 & 0 \ & t_3^{-1} \end{pmatrix} \in A

for {t_1,t_2,t_3 \in H}.

Because {a,b,c,d} are non-zero, we see that for all but {O(1)} values of {t_2}, all four entries of the middle matrix here are non-zero. As a consequence, if one fixes {t_2} and lets {t_1,t_3} vary, all of the triple products given above are distinct. Note that if one takes the above triple product and multiplies the diagonal entries together, the {t_1, t_3} terms cancel and one obtains {(a^2 t_2 + bc t_2^{-1}) (bc t_2 + d^2 t_2^{-1})}. This rational map (as a function of {t_2}) is at most four-to-one; each value of this map is associated to at most four values of {t_2}. Putting all this together, we conclude that there are {\gg |H|^3} different triple products one can form here as {t_1,t_2,t_3 \in H} vary, and the claim follows. \Box

Exercise 7 Establish a variant of Proposition 10 in which the maximal tori are replaced by unipotent groups.

Given a group element {g \in SL_2(\overline{k})}, let {Conj(g) := \{ hgh^{-1}: h \in SL_2(\overline{k}) \}} be the conjugacy class of {g}. The behaviour of this class depends on the nature of {g}:

Exercise 8 Let {g} be an element of {SL_2(\overline{k})}.

  • (i) If {g} is central, show that {Conj(g) = \{\pm 1\}}.
  • (ii) If {g} is regular unipotent, show that {Conj(g)} is the space of all regular unipotent elements.
  • (iii) If {g} is negative of a regular unipotent element, show that {Conj(g)} is the space of all negatives of regular unipotent elements.
  • (iv) If {g} is regular semisimple, show that {Conj(g) := \{ g'\in SL_2(\overline{k}): \hbox{tr}(g)= \hbox{tr}(g')\}}.

We can “dualise” the upper bound on maximal tori in Proposition 10 into a lower bound on conjugacy classes:

Proposition 11 (Large conjugacy classes) Let {A} be a finite subgroup of {SL_2(\overline{k})}. Then one of the following statements hold:

  • (Large conjugacy classes) For any regular semisimple or regular projectively unipotent {g \in A}, one has {|A \cap Conj(g)| \gg |A|^{2/3}}.
  • (Trapping) There is a Borel subgroup {B} such that {|A \cap B| \gg |A|}.

Proof: As before we may assume that {|A \cap B| =o(|A|)} for all Borel subgroups {B}. Let {g \in A} be regular semisimple, and consider the map {\phi: h \mapsto hgh^{-1}} from {A} to {A \cap Conj(g)}. For each {g' \in A \cap Conj(g)}, the preimage of {g'} by {\phi} is contained in a coset of the centraliser {C(g') := \{ h \in SL_2(\overline{k}): hg'=g'h\}} of {g'}. As {g} (and hence {g'}) is regular semisimple or regular projectively unipotent, this centraliser is a maximal torus or (two copies of) a unipotent group (this can be seen by placing {g} in Jordan normal form). By Proposition 10 or Exercise 7, we conclude that each preimage of {\phi} has cardinality {O(|A|^{1/3})}, which forces the range to have cardinality {\gg |A|^{2/3}} as claimed. \Box

We remark that this gives a dichotomy analogous to Lemma 3 or Lemma 6 in the case {|A\cap B| =o(|A|)}. Namely, for any {g \in SL_2({\overline{k}})}, either {A \cap Conj(g)} is empty, or {|A \cap Conj(g)| \gg |A|^{2/3}}. We will take advantage of a dichotomy similar to this (but for tori instead of conjugacy classes) in the next section.

We can match the lower bound in Proposition 11 with an upper bound:

Proposition 12 (Larsen-Pink inequality, another special case) Let {A} be a finite subgroup of {SL_2(\overline{k})}. Then one of the following statements hold:

  • (Non-concentration) For any regular semisimple {g \in SL_2(\overline{k})}, one has {|A \cap Conj(g)| \ll |A|^{2/3}}.
  • (Trapping) There is a Borel subgroup {B} such that {|A \cap B| \gg |A|}.

Proof: Again, we may assume that {|A \cap B| = o(|A|)} for all Borel subgroups {B}. In particular, we may take {|A|} larger than any given absolute constant. Let {g} be regular semisimple, and let {S:= A \cap Conj(g) =\{ s \in A: \hbox{tr}(s) = \hbox{tr}(g)\}}; our task is to show that {|S| \ll |A|^{2/3}}. Note from Exercise 8 that {g} is conjugate to {g^{-1}}, and so {S} is symmetric: {S = S^{-1}}. Also, {Sa=aS} for all {a \in A}.

Observe that whenever {a, b \in A} and {s \in S \cap a^{-1} S \cap b^{-1} S}, then the triple {(s,as, bs)} lies in {S^3}; conversely, every triple in {S^3} arises in this manner. Thus we have the identity

\displaystyle  |S|^3 = \sum_{a,b \in A} |S \cap a^{-1} S \cap b^{-1} S|.

We will show that

\displaystyle  \sum_{a,b \in A} |S \cap a^{-1} S \cap b^{-1} S| \ll |A|^2 + |A|^{4/3} |S|, \ \ \ \ \ (1)

which will give {|S| \ll |A|^{2/3}} as required.

We now establish (1). We divide into several contributions. First suppose that {a = \pm 1}. Then we bound the summand by {|S|}; there are {O(|A|)} summands here, leading to a total contribution of {O(|A| |S|)}, which is acceptable. Similarly if {b=\pm 1}, or {a = \pm b}, so we may restrict to the remaining cases when {\pm 1}, {\pm a}, {\pm b} are distinct. In particular, {a, b} are now either regular unipotent or regular semisimple.

We now consider the case in which {1,a,b} are linearly dependent (in the space {M_2(\overline{k})} of {2\times 2} matrices). For fixed {a}, this constrains {b} to either a maximal torus or a unipotent group (depending on whether {a} is regular semisimple or regular projectively unipotent); this is easiest to see by placing {a} in Jordan canonical form. By the preceding results, we see that there are {O(|A|^{1/3})} choices of {b} for each {|A|}, leading to a contribution of {O( |A|^{4/3} |S| )} in this case, which is acceptable. So we may now take {1,a,b} to be linearly independent.

The set {S \cap a^{-1} S \cap b^{-1} S} is the intersection of {A} with the affine line

\displaystyle  \ell := \{ s \in M_2(\overline{k}); \hbox{tr}(s) = \hbox{tr}(as) = \hbox{tr}(bs) = \hbox{tr}(g) \};

this is indeed a line when {1,a,b} are linearly independent. In most cases, this line {\ell} will intersect {SL_2(\overline{k})} (which we can view as a quadric surface in {M_2(\overline{k})}) in at most two points, leading to a contribution of {O(|A|^2)} for this case, which is acceptable. The only cases left to treat are when the line {\ell} are incident to {SL_2(\overline{k})}. This only occurs when the line {\ell} takes the form {hU} for some {h \in SL_2(\overline{k})} and unipotent group {U}; this is easiest to see by multiplying {\ell} on the left so that it contains the identity, and then placing another element of the line in Jordan normal form. In that case, we have

\displaystyle  \hbox{tr}(hu) = \hbox{tr}(ahu) = \hbox{tr}(bhu) = \hbox{tr}(g)

for all {u \in U}. This forces {h, ah, bh} to all lie in the Borel subgroup {B} associated to {U} (this is easiest to see by first conjugating {U} into the standard unipotent group {U(\overline{k})}). In particular, {a, b} both lie in {B}. Furthermore, if we write {\hbox{tr}(g) = t + t^{-1}}, then the diagonal entries of {h,ah,bh} are {t,t^{-1}} or {t^{-1},t}, and so the diagonal entries of {a,b} are either {1,1} or {t^{-2},t^2} or {t^2,t^{-2}}. In particular, {U} is the stabiliser of one of the eigenvectors of {a} – so for fixed {a}, there are at most two choices for {U} (recall that {a} was regular). Furthermore, for fixed {a} and {U}, {b} is constrained to lie in at most three cosets of {U}. As such, there are only {O(|A|^{1/3})} choices of {b} here for each {a}, giving another contribution of {O(|A|^{4/3}|S|)}, and the claim follows. \Box

Exercise 9 Let {A} be a finite subgroup of {SL_2(\overline{k})}, such that {|A \cap B| = o(|A|)} for all Borel subgroups {B}. Show that at most {O(|A|^{2/3})} of the elements of {A} are unipotent.

We can use the upper bound on conjugacy classes to obtain a lower bound on tori:

Proposition 13 (Large tori) Let {A} be a finite subgroup of {SL_2(\overline{k})}. Then one of the following statements hold:

  • (Large torus) For any regular semisimple {g \in A}, one has {|A \cap T| \gg |A|^{1/3}}, where {T} is the unique maximal torus containing {g}.
  • (Trapping) There is a Borel subgroup {B} such that {|A \cap B| \gg |A|}.

Proof: We can of course assume that the trapping case does not occur. We consider the map {\phi: a\mapsto aga^{-1}} from {A} to {A \cap Conj(g)}. By Proposition 12, the range of {\phi} has cardinality {O(|A|^{2/3})}, so by the pigeonhole principle, there is a preimage of {A \cap Conj(g)} of cardinality {\gg |A|^{1/3}}. But all preimages are conjugate to each other, so the preimage of {g} has cardinality {\gg |A|^{1/3}}. But this preimage is the intersection of {A} with the centraliser of {g}, which two cosets of {T}, and so {|A \cap T| \gg |A|^{1/3}} as required. \Box

Exercise 10 Establish a variant of Proposition 13 in which {g} is regular unipotent instead of regular semisimple, and {T} is replaced with the unique unipotent group containing {g}.

This gives a second (and particularly useful) dichotomy: assuming {A} is not trapped by a Borel subgroup, for a maximal torus {T}, {|A\cap T|} is either zero or comparable to {|A|^{1/3}}.

To exploit this, we use the following counting argument of Larsen and Pink (which is also reminiscent of an old argument of Jordan, used to prove his theorem mentioned previously), followed by some ad hoc arguments specific to {SL_2}. We continue to assume that {A} is not trapped by a Borel subgroup. Let {Z := A \cap\{+1,-1\}} denote the central elements of {A}, thus {|Z|} is either {1} or {2}. Observe that every element in {A \backslash Z} is either regular projectively unipotent or regular semisimple; in the latter case, the element lies in a unique maximal torus, which also contains {Z}. We conclude that

\displaystyle |A|-|Z| = u|Z| + \sum_T (|A \cap T| - |Z|)

where {T} ranges over all the maximal tori that intersect {A}, and {u} is the number of regular projective unipotents in {A}.

If we conjugate a maximal torus {T} by an element of {A}, we get another maximal torus, or the same maximal torus if the element used to conjugate {T} in was in the normaliser {N_A(T) := \{ a \in A: a T=Ta\}} of {T}. Thus, by the orbit-stabilizer theorem, there are exactly {|A|/|N_A(T)|} tori conjugate to {T} in {A}. We thus see that

\displaystyle |A|-|Z| = u + \sum_{T \in {\mathcal T}} \frac{|A|}{|N_A(T)|} (|A \cap T| - |Z|)

where {{\mathcal T}} is a collection of representatives of conjugacy classes of maximal tori intersecting {A} in a regular semisimple element. We rearrange this as

\displaystyle  1 = \frac{u|Z|+|Z|}{|A|} + \sum_{T \in {\mathcal T}} \frac{1}{[N_A(T):A \cap T]} (1 - \frac{|Z|}{|A \cap T|}).

Note that if {T} is a maximal torus, the normaliser of {T} in {SL_2(\overline{k})} has index {2}. As such, {A\cap T} has index at most two in {N_A(T)}, and so {\frac{1}{[N_A(T):A \cap T]}} is either equal to {1} or {1/2} for each {T}. From the preceding bounds on tori and unipotent elements, we also have {\frac{|Z|}{|A \cap T|} \sim |A|^{-1/3}} and {\frac{u+|Z|}{|A|} = O( |A|^{-1/3} )}. As we are assuming {|A|} to be large, the above equation is only consistent when {{\mathcal T}} has cardinality {1} or {2}, and {\frac{u|Z|+|Z|}{|A|}} is comparable to {|A|^{-1/3}}, or equivalently that {u} is comparable to {|A|^{2/3}}. Thus, {A} has plenty of regular projective unipotents (matching the upper bound from Exercise 9); in particular, there is at least one regular unipotent.

Applying a conjugation, we may assume that {A} contains {e := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}}, thus

\displaystyle  A \cap U(\overline{k}) = \{\begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}: t \in E\}

for some additive group {E \subset \overline{k}} containing {1}. By Exercise 7, {|E| \ll |A|^{1/3}}; by Exercise 10, we have {|E|\gg |A|^{1/3}} also.

The map {a \mapsto a(A \cap U(\overline{k}))a^{-1}} maps {A} to unipotent groups that intersect {A} in {\sim |A|^{1/3}} regular unipotents. As there are {\sim |A|^{2/3}} regular unipotent elements in {A}, we see that there are only {O(|A|^{1/3})} such unipotent groups available. From the pigeonhole principle and conjugation, we conclude that the preimage of {U(\overline{k})} in this map has cardinality {\gg |A|^{2/3}}. But this preimage is simply {A \cap B(\overline{k})}. In particular, the quotient {(A\cap B(\overline{k}))/(A \cap U(\overline{k}))} has cardinality {\gg |A|^{1/3}}. Observe that each element of this quotient acts on {A \cap U(\overline{k})}, and hence on {E}, by multiplication. As such, if we set

\displaystyle  F := \{ \xi \in \overline{k}: \xi E \subset E\}

to be the “multiplicative symmetry set” of {E}, then we have {|F|\gg |A|^{1/3}}. As {E} is a finite additive group, {F} is a field of size at most {|E| \ll|A|^{1/3}}, thus {F} is a finite field of cardinality {|F| \sim |A|^{1/3}}. Also, {E} is a vector space over {F}; as {E} contains {1} and has cardinality {O(|A|^{1/3})}, we see that {E=F}. Thus we have

\displaystyle  A \cap U(\overline{k}) = \begin{pmatrix} 1 & F \\ 0 & 1 \end{pmatrix}. \ \ \ \ \ (2)

Also, as {(A\cap B(\overline{k}))/(A \cap U(\overline{k}))} has to stabilise {F}, we see that all elements of {A \cap B(\overline{k})} have diagonal elements in {F}. Combining this with (2), we see that {A \cap B(\overline{k})} takes the form

\displaystyle  A \cap B(\overline{k}) = \{ \begin{pmatrix} t & f(t) + x \\ 0 & t^{-1} \end{pmatrix}: t \in H, x\in F\} \ \ \ \ \ (3)

for some multiplicative subgroup {H} of {F^\times}, and some function {f: H \rightarrow \overline{k}}; since {A \cap B(\overline{k})} has cardinality {\gg |A|^{2/3} \sim |F|^2}, {H} must have cardinality {\sim |F|}. By taking the commutators of two matrices in (3), we see that

\displaystyle  f(t) (s-s^{-1}) - f(s) (t-t^{-1}) \in F \ \ \ \ \ (4)

for all {s,t \in H}.

If we select {t_0 \in H} such that {t_0-t_0^{-1}} is non-zero, then by conjugating {A} by a suitable element of {U(\overline{k})} (which does not affect any of the previous control established on {A}) we may normalise {f(t_0)} to be zero. From (4) this makes {f(t) \in F} for all {t \in H}. In particular,

\displaystyle  A \cap B(\overline{k}) \subset B(F). \ \ \ \ \ (5)

Now for any {g \in A}, the subgroups {A \cap B(\overline{k})} and {g^{-1} (A \cap B(\overline{k})) g} of {A} have index {O(|A|^{1/3})}, so their intersection must have cardinality {\gg |A|^{1/3} \gg |F|}, thus

\displaystyle  |A \cap B(F) \cap g B(F) g^{-1}| \gg |F|.

In particular, there must exist either a regular unipotent or a regular semisimple element {h \in A} of {B(F)} such that {ghg^{-1}} also lies in {B(F)}. If {h} is regular semisimple in {B(F)}, it has an eigenbasis in {F^2}, and so {g} must map such an eigenbasis to another eigenbasis, and thus lies in {SL_2(F)}. If instead {h} is regular unipotent, it has the line {\{0\} \times \overline{k}} as the unique (geometric) eigenspace; {g} must preserve this eigenspace and thus lies in {B(\overline{k})}, and thus in {B(F)} and therefore in {SL_2(F)} by (5). Combining the cases, we conclude that {A \subset SL_2(F)}. We may therefore summarise our discussion as follows:

Theorem 14 (Rough description of finite subgroups of {SL_2}) Let {A} be a finite subgroup of {SL_2(\overline{k})}. Then one of the following statements hold:

  • (Arithmetic subgroup) There is a finite subfield {F} of {\overline{k}} with {|F| \sim |A|^{1/3}} such that {A} is contained in a conjugate of {SL_2(F)} (and is thus a subgroup of that conjugate of index {O(1)}).
  • (Trapping) There is a Borel subgroup {B} such that {|A \cap B| \gg |A|}.

In principle, the trapping case can be analysed further (using manipulations similar to those used to reach (5)) but we will not pursue this here. We remark that while these computations were somewhat lengthy (and less elementary and precise than the more classical results of Dickson), they can extend to more complicated algebraic groups, such as {SL_d(\overline{k})}, or more generally to any algebraic group of bounded rank; see this paper of Larsen and Pink for details. In particular, Larsen and Pink were able to use these methods to establish an important subcase of the famous classification of finite simple groups, namely by verifying this classification for sufficiently large subgroups of a linear group of bounded rank over a field of arbitrary characteristic. It is conceivable that these methods may be extended in the future to give an alternate proof of the full classification (for sufficiently large groups, at least).

— 3. The product theorem in {SL_2(k)}

In this section we prove the {d=2} case of Theorem 1. This result was first established (for fields of prime order) by Helfgott and then in the general case by Dinai; we will present a variant of Helfgott’s argument which was developed by Breuillard, Green, and Tao and independently by Pyber and Szabo. It is convenient to rephrase the theorem as follows:

Theorem 15 (Product theorem in {SL_2(k)}, alternate form) Let {k} be a finite field, and let {A} be a {K}-approximate group in {G := SL_2(k)} that generates {G} for some {K \geq 2}. Then one of the following holds:

  • (Close to trivial) One has {|A| \ll K^{O(1)}}.
  • (Close to {G}) One has {|A| \geq K^{-O(1)} |G|}.

Exercise 11 Show that Theorem 1 follows from Theorem 15. (Hint: if {|A^3| \leq |A|^{1+\epsilon}}, use the multiplicative form of the Rusza triangle and covering lemmas to show that {(A \cup \{1\} \cup A^{-1})^2} is a {O(|A|^{O(\epsilon)})}-approximate group.)

The problem now concerns the behaviour of finite approximate subgroups {A} of {SL_2(k)}. The first step will be to establish analogues of the Larsen-Pink non-concentration inequalities of the preceding section, but for approximate subgroups rather than genuine subgroups. (The observation that these inequalities could be usefully extended to the approximate group setting is due to Hrushovski.) We begin by eliminating concentration in linear subgroups.

Lemma 16 (Escape from subspaces) Let {A}, {K} be as in Theorem 15, and let {C>0}. Then one of the following holds:

  • (Close to trivial) One has {|A| \ll_C K^{O_C(1)}}.
  • (Escape) For any {d=0,1,2,3} and any {d}-dimensional subspace {V} of {\overline{k}^4}, such that {V\cap SL_2(\overline{k})} is a subgroup of {SL_2(\overline{k})}, one has {|A^2 \cap V| \leq K^{-C} |A|}.

In practice, we will only apply the escape conclusion for Borel subgroups of {SL_2(\overline{k})}, which are intersections of {SL_2(\overline{k})} with three-dimensional subspaces; however, we need to work with the more general escape construction in the proof of the lemma, for inductive purposes. The claim can in fact be established for any {d}-dimensional subspace {V}, or more generally for bounded complexity {d}-dimensional algebraic varieties; this will be discussed in the next section.

Proof: We induct on {d}. For {d=0}, the claim is trivial, since {|A^2 \cap V|=1} in that case. Now suppose that {d=1,2,3}, and the claim has already been proven for smaller values of {d}.

Let {V} be a {d}-dimensional subspace of {SL_2(\overline{k})} with {V \cap SL_2(\overline{k})} a group, and suppose for contradiction that {|A^2 \cap V|>K^{-C}|A|}. As {A^2} can be covered by {K} copies of {A}, we can find {a\in A} such that

\displaystyle  |aA\cap V| > K^{-C-1} |A|. \ \ \ \ \ (6)

Suppose that there exists an element {b} of {A} such that {bVb^{-1} \neq V}, so that {bVb^{-1} \cap V} has dimension strictly less than {V}. From (6) we have

\displaystyle  |A^4 \cap bVb^{-1}| \geq |baAb^{-1} \cap bVb^{-1}| > K^{-C-1}|A|.

Since {A^4} can be covered by {K^3} right translates of {A}, we can find {g \in A^5} such that

\displaystyle  |gA \cap bVb^{-1}| > K^{-C-4}|A|.

Let {A_1 :=aA \cap V} and {A_2 := gA \cap bVb^{-1}}. Then {A_1 A_2^{-1}} is contained in {A^7}, and so {1_{A_1} * 1_{A_2^{-1}}} is supported on a set of cardinality at most {K^6 |A|}. Since

\displaystyle  \| 1_{A_1} * 1_{A_2^{-1}} \|_{\ell^1} = |A_1||A_2| \geq K^{-2C-5} |A|^2

we thus see from the pigeonhole principle that

\displaystyle  |1_{A_1} * 1_{A_2^{-1}}(x)| \geq K^{-2C-11} |A|.

The left-hand side is {|A_1 \cap x A_2|}, and thus

\displaystyle  |(A_1\cap x A_2)^{-1} \cap (A_1 \cap x A_2)| \geq K^{-2C-11}|A|.

The set in the left-hand side is contained in both {A^2} and in {V\cap (bVb^{-1})} (here we use the group nature of {V \cap SL_2(\overline{k})}), and so

\displaystyle  |A^2 \cap V\cap (bVb^{-1})| \geq K^{-2C-11} |A|.

Applying the induction hypothesis, we conclude that {|A| \leq K^{O_C(1)}}, and the claim follows.

The only remaining case is when {bVb^{-1}=V} for all {b \in A}. As {A} generates {SL_2(k)}, this implies that {V} is normalised by {SL_2(k)}. But this is impossible if {V} has dimension {1,2,3}; see Exercise 12 below. \Box

Exercise 12 (Almost simplicity of {SL_2(k)}) Let {V} be a subspace of {\overline{k}^4} of dimension {1}, {2}, or {3}. Show that the group {\{ g \in SL_2(k): gVg^{-1} = V\}} does not contain all of {SL_2(k)}.

Now we can obtain an approximate version of Proposition 10:

Proposition 17 (Larsen-Pink inequality, special case) Let {A}, {K} be as in Theorem 15. Then for any maximal torus {T}, one has {|A^2 \cap T| \ll K^{O(1)} |A|^{1/3}}.

Proof: We may assume that {|A| \geq K^C} for any given constant {C}, as the claim is trivial otherwise. Similarly, by Lemma 16, we may assume that {|A^2 \cap B|\leq K^{-C}|B|} for all Borel subgroups {B}.

We need to show that {|A^2 \cap T| \ll |A|^{1/3}} for any maximal torus {T}. By conjugation we may take {T} to be the standard maximal torus {T = T(\overline{k})}. (This may make {A} generate a conjugate of {SL_2(k)}, rather than {SL_2(k)} itself, but this will not impact our argument). Set {A' := A^2 \cap T(\overline{k})}, then

\displaystyle  A' := \{ \begin{pmatrix} t & 0 \ & t^{-1} \end{pmatrix}: t\in H \}

for some finite subset {H} of {\overline{k}^\times}. We may assume that {|H| \geq K^C}, as the claim is trivial otherwise. Our task is to show that {|H|^3 \ll K^{O(1)} |A|}.

As in the proof of Proposition 17, we may find an element {g = \begin{pmatrix} a & b \\ c & d \end{pmatrix}} of {A^2} with {a,b,c,d} all non-zero. Since {A' g A' g A' \subset A^{10}}, thus

\displaystyle  \begin{pmatrix} t_1 & 0 \ & t_1^{-1} \end{pmatrix} \begin{pmatrix} a & b \\c & d \end{pmatrix} \begin{pmatrix} t_2 & 0 \ & t_2^{-1} \end{pmatrix} \begin{pmatrix} a & b \\c & d \end{pmatrix} \begin{pmatrix} t_3 & 0 \ & t_3^{-1} \end{pmatrix} \in A^{10}

for all {t_1,t_2,t_3 \in H}. Arguing as in Proposition 17, we have

\displaystyle  |H|^3\ll |A^{10}|,

and the claim follows. \Box

Exercise 13 Show that if the non-concentration conclusion in Proposition 17 holds, then for every maximal torus {T} and every {m \geq 1}, one has {|A^m \cap T| \ll_m K^{O_m(1)} |A|^{1/3}}.

We can now establish variants of the other Larsen-Pink inequalities from the preceding section:

Exercise 14 Establish a variant of Proposition 17 in which the maximal tori are replaced by unipotent groups.

Exercise 15 (Large conjugacy classes) Let {A}, {K} be as in Theorem 15. Show that for any regular semisimple or regular projectively unipotent {g \in A}, one has {|A^3 \cap Conj(g)| \gg K^{-O(1)} |A|^{2/3}}.

Exercise 16 (Larsen-Pink inequality, another special case) Let {A}, {K} be as in Theorem 15. Show that for any regular semisimple {g \in SL_2(\overline{k})} and any {m \geq 1}, one has {|A^m \cap Conj(g)| \ll_m K^{O_m(1)} |A|^{2/3}}.

Exercise 17 (Unipotent bound) Let {A}, {K} be as in Theorem 15. Show that {O(K^{O(1)} |A|^{2/3})} of the elements of {A} are unipotent.

Exercise 18 (Large tori) Let {A}, {K} be as in Theorem 15. Show that for any regular semisimple {g \in A^2}, one has {|A^4 \cap T| \gg K^{-O(1)} |A|^{1/3}}, where {T} is the unique maximal torus containing {g}. In fact one has {|A^2 \cap T| \gg K^{-O(1)} |A|^{1/3}}. (For the latter claim, cover {A^4} by left translates of {A}.)

We now have a dichotomy: given a maximal torus {T}, either {A^2\cap T} has no regular semisimple elements (and thus contains only central elements), or else has cardinality {\gg K^{-O(1)} |A|^{1/3}}. We exploit this dichotomy as follows. Call a maximal torus {T} involved if {A^2 \cap T} contains a regular semisimple element.

Lemma 18 (Key lemma) Let {A}, {K} be as in Theorem 15. Then one of the following statements hold:

  • (Invariance) If {T} is an involved torus and {a \in A}, then {aTa^{-1}} is an involved torus.
  • (Close to trivial) One has {|A| \ll K^{O(1)}}.

Proof: Let {T} be an involved torus, then by the preceding exercise we have {|A^2 \cap T|\gg K^{-O(1)}|A|^{1/3}}, and thus {|A^4 \cap aTa^{-1}| \gg K^{-O(1)}|A|^{1/3}}. Thus, one has {|gA \cap aTa^{-1}| \gg K^{-O(1)} |A|^{1/3}} for some {g \in G}, which implies that {|A^2 \cap aTa^{-1}| \gg K^{-O(1)} |A|^{1/3}}. In particular, if {A} is not close to trivial, {A^2 \cap aTa^{-1}} contains a regular semisimple element and so {aTa^{-1}} is involved, as desired. \Box

We can now finish the proof of Theorem 15. Suppose {A} is not close to trivial. As there are at most {O(K^{O(1)}|A|^{2/3})} unipotent elements and {O(1)} central elements in {A}, {A} at least one regular semisimple element, and so there is at least one involved torus. By the above lemma, and the fact that {A} generates {G}, we see that the set of involved tori is invariant under conjugation by {G}. As {G} has cardinality {\gg |k|^3}, and its intersection with the stabiliser of a single torus has cardinality {O(|k|)}, we conclude that there are {\gg |k|^2 \ll |G|^{2/3}} involved tori. By Exercise 18, each of these tori contains {\gg K^{-O(1)} |A|^{1/3}} regular semisimple elements of {A^2}. Since each regular semisimple element belongs to a unique maximal torus, we conclude that

\displaystyle |A^2| \gg |G|^{2/3} K^{-O(1)} |A|^{1/3};

as {|A^2| \leq K|A|}, we conclude that {|A|\gg K^{-O(1)} |G|}, as claimed.

Exercise 19 Let {A} be a finite {K}-approximate subgroup of {SL_2(\overline{k})} for some algebraically closed field {\overline{k}}. Show that one of the following statements hold:

  • (Close to group) {A} generates a finite subgroup {G} of {SL_2(\overline{k})} with {|G| \ll K^{O(1)} |A|}.
  • (Concentrated in Borel) There is a Borel subgroup {B} of {SL_2(\overline{k})} with {|A \cap B| \gg K^{-O(1)} |B|}.

(Hint: this does not follow directly from Theorem 15, but can be established by a modification of the proof of that theorem.)

Note that the above exercise can be combined with Theorem 14 to give a more detailed description of {A}. The Borel group {B} is solvable, and by using tools from additive combinatorics, such as Freiman’s theorem in solvable groups (or the Helfgott-Lindenstrauss conjecture, discussed in the previous quarter’s notes), one can give even more precise descriptions of {A} (at the cost of losing polynomial dependence of the bounds on {K}), but we will not discuss these topics here.

Exercise 20 Use Exercise 19 to give an alternate proof of Theorem 5. (Hint: there are a number of ways to embed the sum-product problem in a field {k} into a product problem in {SL_2(k)} (or {SL_2(\overline{k})}). For instance, one consider the tripling properties of sets of the form {\{ \begin{pmatrix} a & b \\ c & d \end{pmatrix}: a,b,c,d \in A \}} in terms of sets such as {A^2+A^2} or {A^3+A^3+A^3+A^3}, and then project this set onto {SL_2(k)} (or {PSL_2(k)}), and combine this with the Katz-Tao lemma to obtain Theorem 5. More details of this connection can be found in Section 8 of this paper.) This is of course a much more complicated and inefficient way to establish the sum-product theorem, but it does illustrate the link between the two results (beyond the fact that both proofs exploit a dichotomy). Note alsot that the original proof of the product theorem in {SL_2(F_p)} by Helfgott actually used the sum-product theorem in {F_p} as a key tool.

— 4. The product theorem in {SL_d(k)}

We now discuss the extension of the {SL_2(k)} product theory to the more general groups {SL_d(k)}. Actually, the arguments here will be valid in any almost simple connected algebraic group of bounded rank, but for sake of concreteness we will work with {SL_d(k)}. (This also has the (very) minor advantage that {SL_d} is an affine variety rather than a projective one, so we can work entirely in affine spaces such as {{\bf A}^{d^2}(\overline{k}) :=\overline{k}^{d^2}}; related to this, the only regular maps we need to consider will be polynomial in nature.) There is also some recent work on product theorems in other algebraic groups than the almost simple ones; see for instance the papers of Pyber-Szabo, Gill-Helfgott, and Breuillard-Green-Tao for some examples of this. It is conceivable that a satisfactory understanding of approximate subgroups of arbitrary algebraic groups of bounded dimension will be available in the near future.

The treatment of the {d=2} case relied on a number of ad hoc computations which were only valid in {SL_2}, and also on the pleasant fact that the only non-regular elements of {SL_2} were the central elements {\pm 1}, which is certainly false for higher values of {d}. In this paper, Helfgott was able to push his original {d=2} arguments to the {d=3} case, but again the arguments were somewhat ad hoc in nature and did not seem to extend to the general setting. However, the arguments based on the Larsen-Pink concentration estimates have proven to be quite general, and in particular can handle the situation {SL_d(k)}. The one catch is that instead of working with very concrete and explicit subsets of {SL_2}, such as Borel subgroups or other intersections {SL_2 \cap V} with linear spaces {V}, one has to work with more general algebraic subvarieties of {SL_d}. As such, a certain amount of basic algebraic geometry becomes necessary. Also, because we are seeking results with quantitative bounds, we will need to keep some track of the “complexity” of the varieties that one encounters in the course of the argument.

We now very quickly review some algebraic geometry notions, though for reasons of space we will not attempt to develop the full theory of algebraic geometry here, referring instead to standard texts such as Harris, Mumford, or Griffiths-Harris. As usual, algebraic geometry is cleanest when working over an algebraically closed field, so we will work primarily over {\overline{k}}.

Definition 19 (Variety) Let {M \geq d \geq 0} be integers, and let {\overline{k}} be an algebraically closed field. We write {{\bf A}^d(\overline{k})} for the affine space {\overline{k}^d}.

  • An (affine) variety {V = V(\overline{k}) \subset {\bf A}^d(\overline{k})} of complexity at most {M} is a set of the form

    \displaystyle  V = \{ x \in \overline{k}^d: P_1(x) =\ldots = P_m(x) = 0 \},

    where {0 \leq m \leq M} and {P_1,\ldots,P_m: {\bf A}^d(\overline{k}) \rightarrow\overline{k}} are polynomials of degree at most {M}. (Thus the complexity parameter {M} controls the dimension, degree, and number of polynomials needed to cut out the variety. Note that we do not assume our varieties to be irreducible, and as such what we call a variety corresponds to what is sometimes known as an algebraic set in the literature.) Note that the union or intersection of two varieties of complexity at most {M}, is another variety of complexity at most {O_M(1)}.

  • A variety is irreducible if it cannot be expressed as the union of two proper (i.e. strict) subvarieties.

Thus, for instance, {SL_d(\overline{k})} is a variety of complexity {O_d(1)} in {{\bf A}^{d^2}(\overline{k})} (after identifying this latter affine space with the space of {d \times d} matrices over {\overline{k}}).

It is known that any variety can be expressed as the union of a finite number of irreducible components, and this decomposition is unique if we require that no component is contained in any other. Furthermore, to each irreducible variety {V} one can assign a dimension {\hbox{dim}(V)}, defined as the maximal integer {D} for which there exists a chain

\displaystyle  \emptyset \neq V_0 \subsetneq \ldots \subsetneq V_D = V

of irreducible varieties; this will be an integer between {0} and {D}. For instance, it can be shown that {{\bf A}^d(\overline{k})} has dimension {d} (as expected). We define the dimension of a non-irreducible variety {V} to be the least integer {D} such that {V} can be covered by finitely many irreducible varieties of dimension {D}. If a (non-empty) variety {V} can be cut out from an irreducible variety {W} by setting {m} polynomials to zero, then one has {\hbox{dim}(W) - m \leq \hbox{dim}(V) \leq \hbox{dim}(W)}. Since {SL_d(\overline{k})} can be cut out from {{\bf A}^{d^2}(\overline{k})} by a single polynomial, and is not equal to all of {{\bf A}^{d^2}(\overline{k})}, we conclude in particular that {SL_d(\overline{k})} has dimension {d^2-1}.

One can show that the image of a {D}-dimensional variety by a polynomial map {P: {\bf A}^{d_1}\rightarrow {\bf A}^{d_2}} is contained in a variety of dimension at most {D}. One can thus produce upper bounds on the dimension of varieties, by covering them by polynomial images of varieties already known to be bounded by the same dimension.

An algebraic subgroup of {SL_d(\overline{k})} is a subvariety of {SL_d(\overline{k})} which is also a subgroup of {SL_d(\overline{k})}. For instance, the standard maximal torus {T(\overline{k})}, consisting of all the diagonal elements of {SL_d(\overline{k})}, is an algebraic subgroup; more generally, any maximal torus, by which we mean a conjugate of the standard maximal torus, is an algebraic subgroup.

Exercise 21 Show that every maximal torus has dimension {d-1} and complexity {O_d(1)}.

Dual to the maximal tori are the conjugacy classes {Conj(g) := \{ hgh^{-1}: h \in SL_d(\overline{k})\}} of regular semisimple elements. We call a element {g} of {SL_d(\overline{k})} regular semisimple if it has {d} distinct eigenvalues, and is thus diagonalisable. Observe that each regular semisimple element lies in precisely one maximal torus.

Exercise 22 Show that every conjugacy class of a regular semisimple element has dimension {d^2-d} and complexity {O_d(1)}.

If {F} is a finite subfield of {\overline{k}}, then {SL_d(F)} is a finite subgroup of {SL_d(\overline{k})}, and is thus technically a {0}-dimensional algebraic subgroup of {SL_d(\overline{k})}. However, the complexity of this algebraic group is huge (comparable to the cardinality of {SL_d(F)}). It turns out that {SL_d(F)} is “effectively Zariski-dense” in the sense that it cannot be captured in a low complexity algebraic variety:

Lemma 20 (Schwartz-Zippel lemma for {SL_d(F)}) Let {V} be a proper subvariety of {SL_d(\overline{k})} of complexity at most {M}. Let {F} be a finite subfield of {\overline{k}}. Then

\displaystyle  |SL_d(F) \cap V| \ll_{M,d} |F|^{d^2-2}.

Proof: {SL_d(\overline{k})} is the hypersurface in {{\bf A}^{d^2}(\overline{k})} cut out by the determinant polynomial. As {V} is a proper subvariety, we can find a polynomial {P: \overline{k}^{d^2} \rightarrow \overline{k}} which is not a multiple of the determinant polynomial, but which vanishes on {V}; by the complexity hypothesis we may take {P} to have degree {O_{M,d}(1)}. Our task is then to show that

\displaystyle  |\{ x \in SL_d(F): P(x)=0 \}| \ll_{M,d} |F|^{d^2-2}.

Let us write the {d^2} coordinates of {{\bf A}^{d^2}(\overline{k})} arbitrarily as {x_1,\ldots,x_{d^2}}. In a given element of {SL_d(F)}, not all of the {x_i} can be zero; thus by symmetry and relabeling if necessary it suffices to show that

\displaystyle  |\{ x \in SL_d(F): P(x)=0; x_{d^2} \neq 0 \}| \ll_{M,d} |F|^{d^2-2}. \ \ \ \ \ (7)

But then one can express {x_{d^2}} as a rational function of the other {d^2-1} coordinates, and the left-hand side of (7) is contained in a set of the form {\{ x \in F^{d^2-1}: Q(x)=0\}} for some polynomial {Q} of degree {O_{M,d}(1)} that is not identically zero. The claim then follows from the Schwartz-Zippel lemma, which we give as an exercise below. \Box

Exercise 23 (Schwartz-Zippel lemma) Let {F} be a finite field, and let {Q:F^d \rightarrow F} be a polynomial of degree {D} that is not identically zero. Show that

\displaystyle  |\{ x \in F^d: Q(x) = 0 \}| \ll_d D |F|^{d-1}.

For an additional challenge, obtain the sharper bound

\displaystyle  |\{ x \in F^d: Q(x) = 0 \}| \leq D |F|^{d-1}.

We contrast this with the size of {SL_d(F)} itself:

Exercise 24 Let {F} be a finite field. Show that {|F|^{d^2-1} \ll_d |SL_d(F)|\ll_d |F|^{d^2-1}}.

The key non-concentration inequality we will need is the following.

Proposition 21 (Larsen-Pink inequality) Let {A} be a {K}-approximate subgroup of {SL_d(\overline{k})} for some {K \geq 2}, and let {V} be a subvariety of {SL_d(\overline{k})} of complexity at most {M}. Let {m \geq 1}. Then one of the following is true:

This inequality subsumes results such as Proposition 17, Exercise 14, and Exercise 16. Note from Lemma 20 (and Exercise 24) that the trapping option of the above proposition cannot occur if {A} generates {SL_d(F)} and {|F|} is sufficiently large depending on {M, d, m}, while the non-concentration claim is trivial when {|F| = O_{M,d,m}(1)}; thus in this case we have (9) unconditionally.

The proof of Proposition 21 is somewhat complicated and is deferred to the next section. We record some particular consequences of this inequality.

Exercise 25 (Consequences of the non-concentration inequality) Let {A} be a {K}-approximate subgroup of {SL_d(\overline{k})} for some {K \geq 2}, which generates {SL_d(F)} for some finite field {F}.

  • (i) If {T} is a maximal torus (and thus of dimension {d-1}), show that {|A^{10} \cap T| \ll_d K^{O_d(1)} |A|^{\frac{1}{d+1}}}.
  • (ii) If {T_0} denotes the elements of {T} which are not regular semisimple, show that {|A^{10} \cap T| \ll_d K^{O_d(1)} |A|^{\frac{d-2}{d^2-1}}}.
  • (iii) If {g \in SL_d(\overline{k})} is regular semisimple, show that {|A^{10} \cap Conj(g)|\ll_d K^{O_d(1)} |A|^{\frac{d}{d+1}}}.
  • (iv) Show that at most {O_d(K^{O_d(1)} |A|^{\frac{d^2-2}{d^2-1}})} of the elements of {A} are not regular semisimple.
  • (v) For any regular semisimple {g \in A}, show that {|A^3 \cap Conj(g)| \gg_d K^{-O_d(1)} |A|^{\frac{d}{d+1}}}.
  • (vi) For any regular semisimple {g \in A}, show that {|A^2 \cap T| \gg_d K^{-O_d(1)} |A|^{\frac{1}{d+1}}}.

Exercise 26 By repeating the arguments of the preceding section, establish Theorem 1 for general {d}.

Remark 6 There is an analogue of Exercise 19, in which the role of the Borel subgroups is replaced by proper algebraic subgroups of bounded complexity; see Theorem 5.5 of the paper of Breuillard, Green, and Tao for a more precise statement.

— 5. Proof of the Larsen-Pink inequality (optional) —

We now prove Proposition 21. In order to escape the burden of having to keep track of the complexity of everything, we will use the tool of ultraproducts (which we will phrase in the language of nonstandard analysis). See this previous blog post for a discussion of ultraproducts and how they can be used to turn quantitative (or “hard”) analysis tasks into qualitative (or “soft”) analysis tasks. One can also use the machinery of schemes and inverse limits as a substitute for the ultraproduct formalism; this is the approach taken in the paper of Larsen and Pink. The paper of Breuillard, Green, and Tao has a slightly reduced reliance on ultraproducts, at the cost of more complexity bookkeeping, while the Pyber-Szabo paper avoids ultraproducts altogether but has perhaps the most bookkeeping of all the papers mentioned here (but, by the same token, is the only argument currently known which gives effective bounds). We will thus presume some familiarity both with ultraproducts (and nonstandard analysis) and with algebraic geometry in this section.

As in the previously mentioned blog post, we select a non-principal ultrafilter {\alpha \in \beta {\bf N} \backslash {\bf N}}, and use it to construct ultraproducts and nonstandard objects. (To ensure the existence of such an object, we shall assume the axiom of choice, as we have already been doing implicitly throughout this course.) We also use the usual nonstandard asymptotic notation, thus for instance {O(1)} denotes a nonstandard quantity bounded in magnitude by a standard number.

The quantitative Larsen-Pink inequality (Proposition 21) can then be deduced from the following nonstandard version, in which all references to complexity are now absent:

Proposition 22 (Larsen-Pink inequality) Let {d \geq 2} be standard. Let {\overline{k} = \prod_{n \rightarrow\alpha} \overline{k_n}} be a nonstandard algebraically complete field (i.e. an ultraproduct of standard algebraically complete fields). Let {K = \lim_{n \rightarrow\alpha} K_n \geq 2} be a nonstandard natural number, and let {A} be a nonstandard {K}-approximate subgroup of {SL_d(\overline{k})} (i.e. an ultraproduct {A = \prod_{n \rightarrow \alpha} A_n} of standard {K_n}-approximate subgroups of {SL_d(\overline{k_n})}), and let {V} be a subvariety of {SL_d(\overline{k})}. Then one of the following is true:

Let us see why Proposition 22 implies Proposition 21. Suppose for contradiction that Proposition 21 failed. Carefully negating all the quantifiers (and using the axiom of choice), this means that there is a sequence {\overline{k_n}} of standard algebraically closed fields, a sequence {K_n \ge 2} of standard numbers, a sequence {A_n} of {K_n}-approximate subgroups of {SL_d(\overline{k_n})}, and a standard {M \geq 1}, a sequence {V_n} of subvarieties of {SL_d(\overline{k_n})} of complexity at most {M}, and a standard {m \geq 1}, such that for each {n}, {A_n} is not contained in a proper algebraicd subgroup of {SL_d(\overline{k_n})} of complexity {n} or less, and one has

\displaystyle  |A_n^m \cap V_n| \geq n K^n |A|^{dim(V_n)/dim(SL_d)}.

Now one forms the ultralimit {K :=\lim_{n \rightarrow\alpha} K_n} and the ultraproducts {\overline{k} := \prod_{n \rightarrow \alpha} \overline{k_n}}, {A := \prod_{n \rightarrow \alpha} A_n}, {V := \prod_{n \rightarrow \alpha} V_n}. Then {\overline{k}} is an algebraically closed field, {A} is a nonstandard {K}-approximate subgroup of {SL_d(\overline{k})}, and {V} is an algebraic subvariety of {SL_d(\overline{k})} (here we use the uniform complexity bound). One can also show that {\hbox{dim}(V) = \lim_{n \rightarrow \alpha}\hbox{dim}(V_n)}; see Lemma 3 of this blog post. As such, we have

\displaystyle  |A^m \cap V| \not \ll K^{O(1)} |A|^{dim(V)/dim(SL_d)},

so by Proposition 22, {A} is contained in a proper algebraic subgroup {H} of {SL_d(\overline{k})}. By unpacking the coefficients of all the polynomials over {\overline{k}} used to cut out {H}, we see that {H} is itself an ultraproduct {H = \prod_{n \rightarrow \alpha} H_n} of proper algebraic subgroups of {SL_d(\overline{k_n})}, of complexity bounded uniformly in {n}. By Los’s theorem, one has {A_n \subset H_n} for all {n} sufficiently close to {\alpha}, which gives a contradiction for {n} large enough.

It remains to establish Proposition 22. By Los’s theorem, the ultraproduct {\overline{k}} of algebraically closed fields is again algebraically closed, which allows us to use algebraic geometry in the nonstandard field {\overline{k}} without difficulty.

Let {\langle A\rangle} be the group generated by {A}, and consider the Zariski closure {\overline{\langle A \rangle}} of this group, that is to say the intersection of all the varieties containing {\langle A \rangle}. This is again an algebraic variety (here we use the Noetherian property of varieties, that there does not exist any infinite descending chain of varieties), and is also a group (exercise!), and is thus an algebraic subgroup of {SL_d(\overline{k})}. If this subgroup is proper then we have the trapping propertly, so we may assume that the closure is all of {SL_d(\overline{k})}. In other words, {\langle A\rangle} is Zariski dense in {SL_d(\overline{k})}.

For any dimension {D} between {0} and {\hbox{dim}(SL_d)} inclusive, and any standard real {\sigma}, let us call {\sigma} {D}-admissible if one has the bound

\displaystyle  |A^m \cap V| \ll K^{O(1)} |A|^{\sigma}

whenever {m\geq 1} is standard and {V} is a {D}-dimensional subvariety of {SL_d(\overline{k})}. Our task is to show that {D/\hbox{dim}(SL_d)} is admissible for all {0 \leq D \leq \hbox{dim}(SL_d)}. This claim is trivial at the two endpoints {D=0} and {D=\hbox{dim}(SL_d)}; the difficulty is to somehow “interpolate” between these two endpoints. We need the following combinatorial observation.

Exercise 27 (Extreme dimensions) Suppose for sake of contradiction that {D/\hbox{dim}(SL_d)} is inadmissible for some {0 < D< \hbox{dim}(SL_d)}. Show that we can find dimensions

\displaystyle  0 < D_1 \leq D_2 < \hbox{dim}(SL_d)

and a real number {\theta \geq 1/\hbox{dim}(SL_d)} such that

  • {D_1 \theta} is not {D_1}-admissible;
  • {D_2 \theta} is not {D_2}-admissible;
  • {D\theta} is {D}-admissible whenever {0 \leq D < D_1} or {D_2 < D \leq \hbox{dim}(SL_d)};
  • {(D+1)\theta} is {D}-admissible for any {0 \leq D \leq \hbox{dim}(SL_d)}.

Let {D_1,D_2,\theta} be as in the above exercise. By construction, we can then find subvarieties {V_1, V_2} of {SL_d(\overline{k})} of dimension {D_1,D_2} respectively and standard positive integers {m_1,m_2} such that

\displaystyle  |A^{m_1} \cap V_1| \not \ll K^{O(1)} |A|^{\theta D_1} \ \ \ \ \ (10)

and

\displaystyle  |A^{m_2} \cap V_2| \not \ll K^{O(1)} |A|^{\theta D_2}. \ \ \ \ \ (11)

On the other hand, we have

\displaystyle  |A^{m} \cap V| \ll K^{O(1)} |A|^{\theta (\hbox{dim}(V)+1)} \ \ \ \ \ (12)

whenever {V} is a subvariety of {SL_d(\overline{k})}, with the improvement

\displaystyle  |A^{m} \cap V| \ll K^{O(1)} |A|^{\theta \hbox{dim}(V)} \ \ \ \ \ (13)

whenever {V} has dimension strictly less than {D_1}, or strictly greater than {D_2}.

We can use (12), (13) to show that {A^{m_1} \times A^{m_2}} is “quantitatively Zariski dense” in {V_1 \times V_2}:

Lemma 23 (Quantitative Zariski density) For any proper subvariety {W} of {V_1 \times V_2}, we have

\displaystyle  |(A^{m_1} \times A^{m_2}) \cap W|\ll K^{O(1)} |A|^{\theta (D_1+D_2)}.

Proof: {W} has dimension at most {D_1+D_2-1}. By standard algebraic geometry, we see that for each {0\leq D \leq D_1}, the set of {y \in V_2} for which the slice {\{ x \in V_1: (x,y) \in W\}} has dimension {D}, has dimension at most {D_1+D_2-D-1}. In particular, if {D < D_1}, then by (12), (13) the contribution of such {x} to {|(A^{m_1} \times A^{m_2}) \cap W|} is at most

\displaystyle  K^{O(1)} \times |A|^{\theta D} \times K^{O(1)} |A|^{\theta (D_1+D_2-D-1+1)}

while if {D = D_1}, then the contribution is at most

\displaystyle  K^{O(1)} \times |A|^{\theta (D+1)} \times K^{O(1)} |A|^{\theta (D_1+D_2-D-1)}.

(One may wonder about the question of uniformity in the {O()} notation, but in nonstandard analysis one can automatically gain such uniformity through countable saturation; see Exercise 20 of this blog post.) Summing over all {D} we obtain the claim. \Box

We will now use a counting argument (which is, unsurprisingly, related to the counting argument used to establish Proposition 17, or any of the other Larsen-Pink inequalities in preceding sections) to obtain a contradiction from these four estimates.

First, by decomposing {V_1,V_2} into irreducible components (and using (12) to eliminate all lower-dimensional components) we may assume that {V_1,V_2} are both irreducible.

The product {V_1 \cdot V_2} is not necessarily a variety, but it is still a constructible set (i.e. a finite boolean combination of varieties), and can still be assigned a dimension (by equating the dimension of a constructible set with the dimension of its Zariski closure). As it contains a translate of {V_2}, it has dimension at least {D_2}. It would be convenient if {V_1 \cdot V_2} had dimension strictly greater than {V_2}. This is not necessarily the case, but it turns out that it becomes so after a generic conjugation, thanks to the almost simplicity of {SL_d}:

Exercise 28 (Almost simplicity) Show that the only proper normal subgroups of {SL_d(\overline{k})} are those contained in the centre of {SL_d(\overline{k})}, i.e. in the identity matrix multipled by the {d^{th}} roots of unity. (Hint: Let {G} be a normal subgroup of {SL_d(\overline{k})} that contains an element which is not a multiple of the identity. Place that element in Jordan normal form and divide it by one of its conjugates to make it fix a subspace of {\overline{k}^d}; iterate this procedure until one finds an element in {G} that is the direct sum of the identity in {SL_{d-2}(\overline{k})} and a non-central element of {SL_2(\overline{k})}. Then use this to generate all of {SL_d(\overline{k})}.)

Proposition 24 (Generic skewness) For generic {g \in SL_d(\overline{k})} (i.e. for all {g} in {SL_d(\overline{k})} outside of a lower-dimensional variety), the set {V_1 \cdot g \cdot V_2} has dimension strictly greater than {D_2}.

Proof: Let {g \in SL_d(\overline{k})}, and assume that {V_1 \cdot g \cdot V_2} has dimension exactly {D_2}. This set contains all the translates {xg V_2} with {x \in V_1}, which are each {D_2}-dimensional irreducible varieties. By splitting up {V_1\cdot g \cdot V_2} into components, we conclude that there are only finitely many distinct translates {xg V_2}. If we denote one of these translates as {W}, the set {\{ x \in SL_d(\overline{k}): xgV_2 = W \}} is easily seen to be a variety (as it is the intersection of varieties {Wy^{-1}g^{-1}} for {y \in V_2}); as a finite number of these sets cover {V_1}, at least one of them has to be all of {V_1}; thus there is a {W} such that {xgV_2=W} for all {x \in V_1}. In particular, this implies that {g^{-1} y^{-1} x g V_2 = V_2} for all {x,y \in V_1}.

Let {S := \{ h \in SL_d(\overline{k}): hV_2 = V_2\}}. Arguing as before, {S} is a variety, and is also a group; it is thus an algebraic group, and by the preceding discussion we have {g^{-1} V_1^{-1} V_1 g \subset S}.

The set {\{ g \in SL_d(\overline{k}): g^{-1} V_1^{-1} V_1 g \subset S \}} is a variety. If it has dimension strictly less than that of {SL_d}, we are done, so we may assume this set is all of {SL_d}; thus {g^{-1} V_1^{-1} V_1 g \subset S} for all {g \in SL_d(\overline{k})}. By almost simplicity, the normal subgroup generated by {V_1^{-1} V_1} is all of {SL_d(\overline{k})}; thus {S} must be all of {SL_d(\overline{k})}, thus {hV_2 = V_2} for all {h \in SL_d(\overline{k})}. But this forces {V_2 = SL_d(\overline{k})}, a contradiction since {D_2} is strictly less than {\hbox{dim}(SL_d)}. \Box

Combining this proposition with the Zariski density of {\langle A\rangle}, we see that we can find {g \in A^m} for some standard {m} such that {V_1 \cdot g \cdot V_2} has dimension {D} strictly greater than {D_2}.

Fix this {g}. Let {\phi: V_1 \times V_2 \rightarrow \overline{V_1 \cdot g \cdot V_2}} be the twisted product map {\phi(x,y) := xgy}. We have the double counting identity

\displaystyle  \sum_{z \in A^{m_1+m+m_2} \cap \overline{V_1 \cdot g \cdot V_2}} |A^{m_1}\times A^{m_2} \cap \phi^{-1}(\{z\})| = |A^{m_1} \cap V_1| |A^{m_2} \cap V_2|

and thus by (10), (11)

\displaystyle  \sum_{z \in A^{m_1+m+m_2} \cap \overline{V_1 \cdot g \cdot V_2}} |A^{m_1}\times A^{m_2} \cap \phi^{-1}(\{z\})| \not \ll K^{O(1)} |A|^{\theta(D_1+D_2)}.

Now, {\phi} is a map from an irreducible {D_1+D_2}-dimensional variety to a {D}-dimensional variety with Zariski-dense image, and is thus a dominant map. Among other things, this implies that there is a subvariety {S} of {V_1 \times V_2} of dimension at most {D_1+D_2-1} such that for all {x \in \overline{V_1 \cdot g \cdot V_2}}, the set {\phi^{-1}(\{x\}) \backslash S} has dimension {D_1+D_2-D}. By (13), we then have

\displaystyle  |A^{m_1}\times A^{m_2} \cap \phi^{-1}(\{z\}) \backslash S| \ll K^{O(1)} |A|^{\theta(D_1+D_2-D)}

for all {z \in A^{m_1+m+m_2} \cap \overline{V_1 \cdot g \cdot V_2}}; by another application of (13), we have

\displaystyle  |A^{m_1+m+m_2} \cap \overline{V_1 \cdot g \cdot V_2}| \ll K^{O(1)} |A|^{\theta D}.

Combining these estimates we see that

\displaystyle  \sum_{z \in A^{m_1+m+m_2} \cap \overline{V_1 \cdot g \cdot V_2}} |A^{m_1}\times A^{m_2} \cap \phi^{-1}(\{z\}) \cap S| \not \ll K^{O(1)} |A|^{\theta(D_1+D_2)}.

The left-hand side simplifies to {|A^{m_1} \times A^{m_2} \cap S|}. But this then contradicts Lemma 23.


Filed under: 254B - expansion in groups, math.AG, math.CO, math.GR Tagged: Larsen-Pink inequality, product theorems, special linear group, sum-product theorems, ultraproducts

Jordan EllenbergWould the death of the journal system be good for women in math?

I am not one of the most radical signatories to the “Cost of Knowledge” statement:  there are certainly some among us who look forward to a world without commercial journals, or even a world without journals at all.  I don’t yet see a clear path to that world.

Nonetheless, I want to add one possible item to the case against journals.

There is lots of inequity in the way mathematicians are assigned status — we all have researchers we think are underappreciated (and some people are quite willing to talk about who they think is overappreciated.)

One very simple source of inequity — but I’ll bet a pretty large one — is that authors decide what journal to submit to.  Some people “aim high” — their method is to ask “what’s the best journal where this paper would fit?”  Others “aim low,” asking something more like “what’s the median journal where papers like this appear?”  You can’t get in the Annals unless you submit to the Annals, and you won’t submit to the Annals very often if you aim low.

Women in the workplace are socialized not to ask for things.  I wouldn’t be surprised to learn that there are disproportionately many men in the “aim high, why shouldn’t my paper be in the Annals?” group.  (And of course, for those who get het up whenever I talk about women in math, this applies just as well to any group of mathematicians disinclined to push for their own work.)

Would things be different if papers in the Annals were selected from all papers, not just those whose authors decided to nominate themselves?  Then publication in a top journal would be a little more like being invited to speak at a prestigious conference.  Would that be an improvement?

 


David Hoggelectromagnetism and massive stars

Inspired in part by our meetings yesterday about Fergus's modeling of imaging data in a coronograph, I worked on a physically motivated re-factor of my physically motivated code to model electromagnetic fields (phase and amplitude) in astronomical telescopes and cameras. I am just a few dozen lines of code away from having a full model (highly approximate) of a simple coronograph.

In the afternoon, Selma de Mink (STScI) gave a nice seminar about extremely massive star evolution. Among many other things, she noted that there is a possibility that low-metallicity, rapidly rotating, massive stars could evolve to very hot temperatures and very high luminosities where no other kinds of stars can be. I think we can find these things in PHAT data on Andromeda; I need to email the team.

John BaezQuantropy (Part 2)

In my first post in this series, we saw that filling in a well-known analogy between statistical mechanics and quantum mechanics requires a new concept: ‘quantropy’. To get some feeling for this concept, we should look at some examples. But to do that, we need to develop some tools to compute quantropy. That’s what we’ll do today.

All these tools will be borrowed from statistical mechanics. So, let me remind you how to compute the entropy of a system in thermal equilibrium starting if we know the energy of every state. Then we’ll copy this and get a formula for the quantropy of a system if we know the action of every history.

Computing entropy

Everything in this section is bog-standard. In case you don’t know, that’s British slang for ‘extremely, perhaps even depressingly, familiar’. Apparently it rains so much in England that bogs are not only standard, they’re the standard of what counts as standard!

Let X be a measure space: physically, the set of states of some system. In statistical mechanics we suppose the system occupies states with probabilities given by some probability distribution

p : X \to [0,\infty)

where of course

\int_X p(x) \, dx = 1

The entropy of this probability distribution is

S = - \int_X p(x) \ln(p(x)) \, dx

There’s a nice way to compute the entropy when our system is in thermal equilibrium. This idea makes sense when we have a function

H : X \to \mathbb{R}

saying the energy of each state. Our system is in thermal equilibrium when p maximizes entropy subject to a constraint on the expected value of energy:

\langle H \rangle = \int_X H(x) p(x) \, dx

A famous calculation shows that thermal equilibrium occurs precisely when p is the so-called Gibbs state:

\displaystyle{ p(x) = \frac{e^{-\beta H(x)}}{Z} }

for some real number \beta, where Z is a normalization factor called the partition function:

Z = \int_X e^{-\beta H(x)} \, dx

The number \beta is called the coolness, since physical considerations say that

\displaystyle{ \beta = \frac{1}{T} }

where T is the temperature in units where Boltzmann’s constant is 1.

There’s a famous way to compute the entropy of the Gibbs state; I don’t know who did it first, but it’s both straightforward and tremendously useful. We take the formula for entropy

S = - \int_X p(x) \ln(p(x)) \, dx

and substitute the Gibbs state

\displaystyle{ p(x) = \frac{e^{-\beta H(x)}}{Z} }

getting

\begin{array}{ccl} S &=& \int_X p(x) \left( \beta H(x) - \ln Z \right)\, dx \\   \\  &=& \beta \, \langle H \rangle - \ln Z \end{array}

Reshuffling this a little bit, we obtain:

- T \ln Z = \langle H \rangle - T S

If we define the free energy by

F = - T \ln Z

then we’ve shown that

F = \langle H \rangle - T S

This justifies the term ‘free energy’: it’s the expected energy minus the energy in the form of heat, namely T S.

It’s nice that we can compute the free energy purely in terms of the partition function and the temperature, or equivalently the coolness \beta:

\displaystyle{ F = - \frac{1}{\beta} \ln Z }

Can we also do this for the entropy? Yes! First we’ll do it for the expected energy:

\begin{array}{ccl} \langle H \rangle &=& \displaystyle{ \int_X H(x) p(x) \, dx } \\   \\  &=& \displaystyle{ \frac{1}{Z} \int_X H(x) e^{-\beta H(x)} \, dx } \\   \\  &=& \displaystyle{ -\frac{1}{Z} \frac{d}{d \beta} \int_X e^{-\beta H(x)} \, dx } \\ \\  &=& \displaystyle{ -\frac{1}{Z} \frac{dZ}{d \beta} } \\ \\  &=& \displaystyle{ - \frac{d}{d \beta} \ln Z } \end{array}

This gives

\begin{array}{ccl} S &=& \beta \, \langle H \rangle - \ln Z \\ \\ &=& \displaystyle{ - \beta \, \frac{d \ln Z}{d \beta} - \ln Z }\end{array}

So, if we know the partition function of a system in thermal equilibrium as a function of the temperature, we can work out its entropy, expected energy and free energy.

Computing quantropy

Now we’ll repeat everything for quantropy! The idea is simply to replace the energy by action and the temperature T by i \hbar where \hbar is Planck’s constant. It’s harder to get the integrals to converge in interesting examples. But we’ll worry about that next time, that when we actually do an example!

It’s annoying that in physics S stands for both entropy and action, since in this article we need to think about both. People also use H to stand for entropy, but that’s no better, since that letter also stands for ‘Hamiltonian’! To avoid this let’s use A to stand for action. This letter is also used to mean ‘Helmholtz free energy’, but we’ll just have to live with that. It would be real bummer if we failed to unify physics just because we ran out of letters.

Let X be a measure space: physically, the set of histories of some system. In quantum mechanics we suppose the system carries out histories with amplitudes given by some function

a : X \to \mathbb{C}

where perhaps surprisingly

\int_X a(x) \, dx = 1

The quantropy of this function is

Q = - \int_X a(x) \ln(a(x)) \, dx

There’s a nice way to compute the entropy in Feynman’s path integral formalism. This formalism makes sense when we have a function

A : X \to \mathbb{R}

saying the action of each history. Feynman proclaimed that in this case we have

\displaystyle{ a(x) = \frac{e^{i A(x)/\hbar}}{Z} }

where \hbar is Planck’s constant and Z is a normalization factor called the partition function:

Z = \int_X e^{i A(x)/\hbar} \, dx

Last time I showed that we obtain Feynman’s prescription for a by demanding that it’s a stationary point for the quantropy

Q = - \int_X a(x) \, \ln (a(x)) \, dx

subject to a constraint on the expected action:

\langle A \rangle = \int_X A(x) a(x) \, dx

As I mentioned last time, the formula for quantropy is dangerous, since we’re taking the logarithm of a complex-valued function. There’s not really a ‘best’ logarithm for a complex number: if we have one choice we can add any multiple of 2 \pi i and get another. So in general, to define quantropy we need to pick a choice of \ln (a(x)) for each point x \in X. That’s a lot of ambiguity!

Luckily, the ambiguity is much less when we use Feynman’s prescription for a. Why? Because then a(x) is defined in terms of an exponential, and it’s easy to take the logarithm of an exponential! So, we can declare that

\ln (a(x)) = \displaystyle{ \ln \left( \frac{e^{iA(x)/\hbar}}{Z}\right) } = \frac{i}{\hbar} A(x) - \ln Z

Once we choose a logarithm for Z, this formula will let us define \ln (a(x)) and thus the quantropy.

So let’s do this, and say the quantropy is

\displaystyle{ Q = \int_X a(x) \left( \frac{i}{\hbar} A(x) - \ln Z \right)\, dx }

We can simplify this a bit, since the integral of a is 1:

\displaystyle{ Q = \frac{i}{\hbar} \langle A \rangle - \ln Z }

Reshuffling this a little bit, we obtain:

- i \hbar \ln Z = \langle A \rangle - i \hbar Q

By analogy to free energy in statistical mechanics, let’s define the free action by

F = - i \hbar \ln Z

I’m using the same letter for free energy and free action, but they play exactly analogous roles, so it’s not so bad. Indeed we now have

F = \langle A \rangle - i \hbar Q

which is the analogue of a formula we saw for free energy in thermodynamics.

It’s nice that we can compute the free action purely in terms of the partition function and Planck’s constant. Can we also do this for the quantropy? Yes!

It’ll be convenient to introduce a parameter

\displaystyle{ \beta = \frac{1}{i \hbar} }

which is analogous to ‘coolness’. We could call it ‘quantum coolness’, but a better name might be classicality, since it’s big when our system is close to classical. Whatever we call it, the main thing is that unlike ordinary coolness, it’s imaginary!

In terms of classicality, we have

\displaystyle{ a(x) = \frac{e^{- \beta A(x)/\hbar}}{Z} }

Now we can compute the expected action just as we computed the expected energy in thermodynamics:

\begin{array}{ccl} \langle A \rangle &=& \displaystyle{ \int_X A(x) a(x) \, dx } \\ \\  &=& \displaystyle{ \frac{1}{Z} \int_X A(x) e^{-\beta A(x)} \, dx } \\   \\  &=& \displaystyle{ -\frac{1}{Z} \frac{d}{d \beta} \int_X e^{-\beta A(x)} \, dx } \\ \\  &=& \displaystyle{ -\frac{1}{Z} \frac{dZ}{d \beta} } \\ \\  &=& \displaystyle{ - \frac{d}{d \beta} \ln Z } \end{array}

This gives:

\begin{array}{ccl} Q &=& \beta \,\langle A \rangle - \ln Z \\ \\ &=& \displaystyle{ - \beta \,\frac{d \ln Z}{d \beta} - \ln Z } \end{array}

So, if we can compute the partition function in the path integral approach to quantum mechanics, we can also work out the quantropy, expected action and free action!

Next time I’ll use these formulas to compute quantropy in an example: the free particle. We’ll see some strange and interesting things.

Summary

Here’s where our analogy stands now:

Statistical Mechanics Quantum Mechanics
states: x \in X histories: x \in X
probabilities: p: X \to [0,\infty) amplitudes: a: X \to \mathbb{C}
energy: H: X \to \mathbb{R} action: A: X \to \mathbb{R}
temperature: T Planck’s constant times i: i \hbar
coolness: \beta = 1/T classicality: \beta = 1/i \hbar
partition function: Z = \sum_{x \in X} e^{-\beta H(x)} partition function: Z = \sum_{x \in X} e^{-\beta A(x)}
Boltzmann distribution: p(x) = e^{-\beta H(x)}/Z Feynman sum over histories: a(x) = e^{-\beta A(x)}/Z
entropy: S = - \sum_{x \in X} p(x) \ln(p(x)) quantropy: Q = - \sum_{x \in X} a(x) \ln(a(x))
expected energy: \langle H \rangle = \sum_{x \in X} p(x) H(x) expected action: \langle A \rangle = \sum_{x \in X} a(x) A(x)
free energy: F = \langle H \rangle - TS free action: F = \langle A \rangle - i \hbar Q
\langle H \rangle = - \frac{d}{d \beta} \ln Z \langle A \rangle = - \frac{d}{d \beta} \ln Z
F = -\frac{1}{\beta} \ln Z F = -\frac{1}{\beta} \ln Z
S = - \ln Z - \beta \,\frac{d}{d \beta}\ln Z Q = - \ln Z - \beta \,\frac{d }{d \beta}\ln Z

I should also say a word about units and dimensional analysis. There’s enormous flexibility in how we do dimensional analysis. Amateurs often don’t realize this, because they’ve just learned one system, but experts take full advantage of this flexibility to pick a setup that’s convenient for what they’re doing. The fewer independent units you use, the fewer dimensionful constants like the speed of light, Planck’s constant and Boltzmann’s constant you see in your formulas. That’s often good. But here I don’t want to set Planck’s constant equal to 1 because I’m treating it as analogous to temperature—so it’s important, and I want to see it. I’m also finding dimensional analysis useful to check my formulas.

So, I’m using units where mass, length and time count as independent dimensions in the sense of dimensional analysis. On the other hand, I’m not treating temperature as an independent dimension: instead, I’m setting Boltzmann’s constant to 1 and using that to translate from temperature into energy. This is fairly common in some circles. And for me, treating temperature as an independent dimension would be analogous to treating Planck’s constant as having its own independent dimension! I don’t feel like doing that.

So, here’s how the dimensional analysis works in my setup:

Statistical Mechanics Quantum Mechanics
probabilities: dimensionless amplitudes: dimensionless
energy: ML/T^2 action: ML/T
temperature: ML/T^2 Planck’s constant: ML/T
coolness: T^2/ML classicality: T/ML
partition function: dimensionless partition function: dimensionless
entropy: dimensionless quantropy: dimensionless
expected energy: ML/T^2 expected action: ML/T
free energy: ML/T^2 free action: ML/T

I like this setup because I often think of entropy as closely allied to information, measured in bits or nats depending on whether I’m using base 2 or base e. From this viewpoint, it should be dimensionless.

Of course, in thermodynamics it’s common to put a factor of Boltzmann’s constant in front of the formula for entropy. Then entropy has units of energy/temperature. But I’m using units where Boltzmann’s constant is 1 and temperature has the same units as energy! So for me, entropy is dimensionless.


Noncommutative GeometryA new book: Noncommutative geometry, arithmetic, and related topics

Proceedings of the JAMI 2009 meeting on ``Noncommutative geometry, arithmetic, and related topics" Just published by Johns Hopkins University Press is available in the market now.Happy reading!

February 10, 2012

Quantum DiariesPeer Review: A Cornerstone of Science

Ah yes, peer review; one of the more misunderstood parts of the scientific method. Peer review is frequently treated as an incantation to separate the wheat from the chaff. What has been peered reviewed is good; what hasn’t is bad. But life is never so simple. In the late 1960s, Joseph Weber (1919 – 2000) published two Physical Review Letters were he claimed to have detected gravitational waves. Although there are a few holdouts who believed he did, the general consensus is that he did not, since his results have not been reproduced. Rather it is generally believed that his results were an experimental artifact. His results were peer reviewed and accepted at a “prestigious” journal but that does not guarantee that they are correct. Even the Nobel committee occasionally makes mistakes, most notably giving the award to the discoverer of lobotomies.

Conversely, consider the case of Alfred Wegener (1880 – 1930). In 1912 he proposed the idea of continental drift. To say the least, it was not enthusiastically received. It did not help that Wegener was meteorologist, not a geologist. This theory was largely rejected by his peers in geology. For example, the University of Chicago geologist Rollin T. Chamberlin said, If we are to believe in Wegener’s hypothesis we must forget everything which has been learned in the past 70 years and start all over again. In 1926, the American Association of Petroleum Geologists (AAPG) held a special symposium on the hypothesis of continental drift and rejected it. After that, the hypothesis was strictly on the fringe until the late 1950s and early ‘60s when it finally became mainstream.

Thus, we see that peer review cannot definitively be relied on to give the final answer. So what use is peer review? The problem is that, as pointed out in previous posts, in science there is no one person who can serve as the ultimate authority; rather, observation is. As a school student, the teacher knows more than the student and can be considered the final authority. In university, the professor plays that role, sometimes with gusto. But when it comes to research, frequently it is the researcher him/herself who is the world expert. So how can research be judged and how do we make decisions about that research? And decisions do have to be made. We cannot publish everything—the useful results would get lost in the noise. We must maintain the collective wisdom that has been laboriously developed. Similarly, decisions have to be made on who gets research grants. Do we use a random number generator? Ok, no snide remarks, I admit that it does occasionally look like we do.  As there is no single human to serve as the final authority, we turn to the people who know the most about the topic, namely the peers of the person. If we want a decision related to sheep farming, we consult sheep farmers; if about nuclear physics, we consult nuclear physicists. Peer review is simply the idea that when we have to make a decision, we consult those people most likely to be able to make an informed decision. Is it perfect? No. Is there a better process? Perhaps, but no one seems to know what it is.

Peer review is also used as a bulwark against bull…, oops, material, that is of questionable validity. The expression, that has not been peer reviewed, is used as a euphemism for, that is complete and utter crap and I am not going to waste my time dealing with it. In this case it tends to come across as closed minded: Not peer reviewed?  It’s nonsense! Needless to say, cranks take great exception and tend to regard peer review as a new priesthood who stifles innovation.  And indeed, as noted above, sometimes peer review does get it wrong. There is always this tension between accepting nonsense and rejecting the next big thing. As the case of continental drift illustrates, it is sometimes only in retrospect, when we have more data, that we can tell what the correct answer is. However, it is better to reject or delay the acceptance of something that has a good chance of being wrong than to have the literature overrun with wrong results (think lobotomies). However, contrary to popular conception, Copernicus and Wegener are the exception, not the rule. That is why Copernicus is still used as the example of the suppression of ideas half a millennium later—there are just not that many good examples. And I might add that both Copernicus and Wegener were initially rejected for good reasons and were accepted once sufficient supporting data came to light.  Most people, who the peer review process deems to be cranks, are indeed cranks. Never heard of Immanuel Velikovsky (1895 – 1979)? Well, there is a reason. The few who were right are remembered, but the multitudes that were wrong are, like Velikovsky, forgotten.

Peer review is one of the cornerstones of science and is an essential part of its error control process. At every level in science we use peers to check for errors. Within well-run collaborations, results are reviewed by the peers within the collaboration before submitting for publication. I will get my peers to read my papers before submission. Even the editing of these posts before being put on line can be considered peer review. Then there is the formal peer review a paper receives when it is submitted to a journal. In many ways this is the least important peer review because it is after a paper is published that it receives its most vigorous peer review. I can be quite sure there is no fundamental flaw in special relativity, not because Einstein was a genius, not because it was published in a prestigious journal, but because after it was published many very clever people tried very hard to find flaws in it and failed. Any widely read scientific paper will be subject to this thorough scrutiny by the author’s peers.  That is the reason we can have confidence in the results of science and why secrecy is the enemy of scientific progress. Given enough eyeballs, all bugs are shallow[1].

Additional posts in this series will appear most Friday afternoons at 3:30 pm Vancouver time. To receive a reminder follow me on Twitter: @musquod.

 

Tim GowersWhat’s wrong with electronic journals?

It probably sounds disingenuous of me to say this, but when I sat down to write a post about Elsevier I wasn’t really trying to start a campaign. My intention was merely to make public, and a little more rigid, a policy that I and many others had already been applying, in my case without much difficulty, for several years. The idea of setting up a website occurred to me as I was writing the post: I considered it (and still consider it) not as a petition to Elsevier to change its ways — since I don’t believe there is any realistic chance of that — but as a simple way to bring out into the open all the private boycotts and semi-boycotts that were going on, and thereby to encourage others to do the same.

By accident, the post seems to have been quite well timed. Probably it’s not an accident at all, and that whatever atmosphere it was that prompted me to get round to writing the post (for example, certain discussions I had had with other mathematicians, some of them online) was the same as what made it a good moment. Anyhow, accident or no, the result is that some people have talked about “momentum”, and I’m starting to feel a responsibility, not particularly welcome (because it threatens to involve work), not to squander that momentum.

I’ve actually been ill in bed for much of the last few days, so most of the rest of this post will be reporting on some feverish thoughts, which I’ll try to organize into a more coherent form. I’ll also try not to write too much, though that may be quite difficult.

What next?

What I really mean is more like, “How much next?” Do we just let the number of signatures at Tyler Neylon’s website continue to grow at its currently healthy rate and sit back and hope that at some point there will be a phase change? That was something like my original plan — or rather non-plan. But there are reasons to suppose that provoking a phase change will take a bit more effort.

I felt I had at least to think about that when Michael Harris made a comment of which here is the beginning.

When the number of signatures reaches a certain target figure — 500, say, or 1000 — the next step is to send an open letter to the members of the editorial board of one of the Elsevier journals, explaining why they might want either to look into changing publishers or, if this is impossible for contractual reasons, to resign. Since the editors are colleagues, the tone should not be confrontational. Instead, one should make the point that their remaining on the editorial board in the face of such a massive show of rejection will naturally be interpreted as a defense of Elsevier’s business practices; and more pragmatically, it will be more difficult to maintain the quality of a journal subject to boycott.

I’m willing to draft such a letter if there is sufficient interest and if no one else volunteers, though I’m hardly the most qualified to do so. It would need at least 20 signatures from a broad sampling of mathematical specialties.

My initial impulse on reading this was to think that maybe that was moving a bit fast. I also latched on eagerly to the words “the tone should not be confrontational” and started mentally drafting letters full of assurances that they were not in any sense a criticism etc. etc. Meanwhile, it soon became clear that the 1000-signatures mark would be quickly passed, as it now has been. (However, the proportion of mathematicians has dropped. For a while it was almost 100% but now it is a lot less than that. So a target that might be appropriate is 1000 mathematicians. Restricting the list by subject is not yet possible, but Tyler Neylon assures me that it will become so. With a bit of effort, I’ve done a not terribly reliable count and concluded that there are 430 mathematicians so far.)

I then read this (written, as you can see, in response to another comment).

Stan,

We agree that technology is making publishing an electronic journal easy without technical expertise.

A group of current UChicago and forner grad students and alums have created Scholastica, (http://www.scholasticahq.com), an academic journal management platform and scholarly community. Anyone can create their own peer reviewed journal, manage their peer review process, and ultimately publish without the need for publishing companies like Elsevier. There’s also a section of the application called ‘The Conversation’ (http://scholasticahq.com/conversation) that is very similar to Mathoverflow that allows academics to build reputation points that can be used to be recruited as a referee.

We hope that this is seen as more than a shameless plug as we’ve been working tirelessly over the last year with no pay to provide something to address the problems with academic publishing that Tim and others describe here.

We would love your support.

- Rob Walsh
Scholastica

A little later, I had an exchange of emails with Brian Cody, another member of the Scholastica team, and it became clear that one of their aims was to make it almost effort free for the editors of a journal to do what the editors of Topology did: resign en masse and start again somewhere else with a modified name. Scholastica may well not be the only venture of its kind, and perhaps one can argue about whether it is the best, but what one can say now, with confidence, is that there is a web tool out there that makes the mechanics of starting up a new (but secretly not so new) journal almost trivial. I’d add that the site is in beta at the moment, with an eager team of developers who are ready to add features if there is a demand for them. I urge people to have a look.

It seems to me that if lots of mathematicians feel that enough is enough with Elsevier, and if it is easy to move a journal, then one really can start to think that something might happen sooner rather than later. But there is one snag, which brings me to the title of this post: a journal set up with Scholastica is electronic. [I write that without being 100% certain that it is correct -- I have written to them to check.]

Electronic Journals.

What’s wrong with that, you might ask? I don’t have a good answer, but I do have a bad answer, which is that I, and probably many other people, have an irrational prejudice against them. (There’s also a potentially better answer to do with whether electronic archives are likely to be as durable as paper ones have shown themselves to be, but I’m going to ignore that issue.) I grew up with the paper journal, I remember the thrill of seeing my first paper in print, I enjoyed browsing in libraries, I liked the long traditions that accompanied certain journals, and so on, and when the first electronic journals started, there just didn’t seem to be any point in submitting to them: why sacrifice that lovely paper when you didn’t have to? Somehow, electronic journals weren’t the real thing.

Recently, however, my prejudice has weakened. An obvious reason is that I don’t actually have any of the experiences that I enjoyed when I was starting out in my career: I can’t remember when I last set foot in a maths library, I think people have stopped sending me fifty offprints whenever a paper of mine comes out (which is a relief, as the ones I do have are a silly waste of shelf space, though I can’t bear to throw them away), the moment a paper “comes out” is nowadays the day I put it on the arXiv rather than the almost irrelevant day a couple of years later when it is published. In short, I do pretty well everything on my computer these days, so the idea of an electronic publication has lost the “unreal” feeling it used to have.

However, I do think that kind of prejudice probably still survives to a significant extent, and that it would be good to try to combat it. Here it seems to me that electronic journals have missed a trick. When I see the name “Electronic Journal of Combinatorics”, for example, my instinct is to read it as something like, “Journal of Combinatorics — except it’s only electronic”. In other words, the word “electronic” has entirely negative associations. (At this point I should say that yesterday out of curiosity I browsed the archive of the Electronic Journal of Combinatorics for the first time ever, and discovered to my surprise, and slight shame, that it was full of excellent papers by excellent mathematicians. Moreover, in the sample I looked at every single paper made me think, “Hmm, that looks interesting.” By way of apology, I shall submit to them when I next have a suitable paper. I was also shocked to discover that Herb Wilf, who founded the journal, died a few weeks ago. That news had passed me by.)

There must surely be ways that an electronic journal could exploit its electronic character in order to have a positive appeal. Why not have an electronic journal that isn’t run on quite the same lines as a conventional journal? Let me describe an imaginary new journal that would be close enough to conventional journals not to ruffle too many feathers but different enough that at least some people might find it dynamic, forward-looking, and somewhere one would love to be published.


Breakthroughs in Mathematics.

The journal Breakthroughs in Mathematics is set up with one main aim: to accept papers only if they are outstanding. As its name suggests, the editors will be looking for papers that open up new areas, get past seemingly impregnable barriers, or solve long-standing open problems.

If you have written such a paper, why might you wish to submit it to Breakthroughs rather than to, say, Annals, Acta or the Journal of the AMS? Here are a few reasons.

1. Our attitude is that if you publish with us, then we are doing you a favour rather than the other way round. The journal does not have a print version, so there is no need to fill issues with papers that do not meet its exacting standards. If a few months go by without a breakthrough, then that’s fine by us. The average number of papers published so far has been about ten per year, so publication in Breakthroughs is something of an event in the way that publication in a conventional journal, however prestigious, is not.

2. We have a large, youthful and diverse editorial board, consisting mainly of mathematicians who are active on the internet. If that is not your thing, then by all means submit to a conventional journal, but if you are part of the internet generation of mathematicians, then you may feel more at home at Breakthroughs.

3. The submission and refereeing process works as follows. Authors are required to submit not just their papers but also a short account of their work, in which they should explain their result in terms that are comprehensible to mathematicians outside their speciality, paying particular attention to what it is that makes it more than just an ordinary piece of very good mathematics. There is then an initial filtering process by the editorial board, helped by quick opinions solicited from experts in the relevant areas, which is based more on the short account of the paper than on the paper itself and is intended to establish whether the result is sufficiently interesting to sufficiently many editors to be publishable in Breakthroughs. In the rare event that it is, the paper then goes to a technical referee, whose job is not to evaluate the paper, but simply to comment on how it is written and to check that the author has done what he or she claims to have done.

4. The technical referee is not anonymous. Indeed, he or she is positively encouraged to interact with the author, asking for help in understanding difficult parts of a paper, and so on. Authors can even nominate their own technical referee if they wish, though Breakthroughs has the final say.

5. When the paper is published, it appears along with an explanation, written by a suitable member of the editorial board, of why it is deemed important enough to appear in Breakthroughs. This will typically be based on the short account provided by the author, as well as on remarks made by the referees, and possibly on other sources such as online discussion of the result (which will typically by this time be quite well known, though we aim to deal with our papers quickly). It also comes with a comments page, to which anybody can contribute remarks about the paper — such as alternative proofs of certain steps, notification of applications, and the like. The author can respond to these remarks. In these ways, we attempt to give a bit of publicity to the papers we publish, and to provide some context for the general reader.

6. We have made a serious attempt to be precise about what is required of a paper for it to be published in Breakthroughs. For details, see our page, “What is a breakthrough?” Of course, it is impossible to give exact necessary and sufficient conditions, but the fact that we at least try makes it clearer what it means to have a Breakthroughs in Mathematics paper on your CV than it would if we simply said that we had very high standards.


But still: what now?

A journal like that is not going to answer the need for new journals to replace the overpriced conventional ones, but it could at least make electronic journals sexy in a way that they aren’t at the moment. It would also have the great virtue of not requiring much work of the editors. (It would require quite a lot of work per accepted paper, but the number of accepted papers would be very small.)

I’m aware though that I haven’t really faced up to the question of whether the editors of an Elsevier journal should be gently encouraged to consider switching publishers. As a matter of fact, I heard from an Elsevier editor recently. Let me call him/her X. X had approached a potential referee and had just received a refusal in which my earlier blog post was mentioned. X was somewhat critical of encouraging people not to referee for Elsevier journals, but said that he/she had some sympathy with the reasons. My guess is that on any journal there will be a small handful of very active editors, often just the official main editors, who in a sense “are” the journal and whose lives could be a little disrupted, and a much wider set of editors who wouldn’t at all mind moving if there were good reasons to do so.

How much of an imposition this would be would depend on a number of factors. One factor I find hard to judge because of my lack of experience running journals is probably the most important: the extent to which the smooth running of a journal depends on a good relationship between the managing editors and certain representatives, who may have genuine mathematical sympathies and expertise, of the publishers. Giving up a relationship like that would be a genuine sacrifice unless there was a realistic prospect of a new and similar relationship to take its place. Asking a print journal to go electronic would also be asking quite a lot, though, for reasons I indicated above, perhaps not too much.

Combinatorics journals.

In the course of writing the last couple of paragraphs I found myself thinking about the situation in combinatorics, and I have come to realize that I am on the editorial boards of at least two Springer journals: the Annals of Combinatorics, which is not really my kind of combinatorics and has involved zero work, and Combinatorica, which is one of my favourite maths journals. Since the general view seems to be that Springer has become a problem company as well, I should perhaps consider my position. I find it quite hard to get comprehensible information about the prices of these journals, but I think that if I could sell the back numbers that I’ve received from them at their official cost price, I could go on a round-the-world cruise and still have plenty of change.

What are the options if you want to publish a good result in combinatorics? (Here, I’m mainly talking about Hungarian-style combinatorics rather than enumerative or algebraic combinatorics.) If the result is interesting enough, you could of course publish in a general-interest journal, but let’s suppose you want it to appear in a specialist journal. The list of journals that would naturally spring to my mind is this. I’ll also give my associations with each one, which should not be taken seriously because I haven’t made any effort to test whether they are correct. I’m sure other people have different pecking orders.

Combinatorica: used to be regarded as the number one journal in combinatorics, and very possibly still is; quite slow and with a big backlog (that was true once but may be out of date). [Springer]

Discrete Mathematics: good solid journal; not of the absolute top rank. [Elsevier]

Edit. The assessments of the next two journals were based on ignorance and were wrong: I am told by those in the know that JCT is roughly on a par with Combinatorica, or perhaps just the tiniest bit behind. So they are very good.

Journal of Combinatorial Theory A: good solid journal; not of the absolute top rank. [Elsevier]

Journal of Combinatorial Theory B: good solid journal; not of the absolute top rank. [Elsevier]

European Journal of Combinatorics: OK, but not as good as I thought it was when I submitted a paper I very much liked to it twenty years ago. [Elsevier]

Random Structures and Algorithms: very good; lots of interesting papers. [Wiley]

Combinatorics, Probability and Computing: a personal favourite; set up recently(ish) by Béla Bollobás and maintains a high standard. [Cambridge University Press]

Electronic Journal of Combinatorics: now that I’ve actually looked into it … good.

I’ve probably missed some obvious further possibilities there, but the fact remains that that is my mental list of good combinatorial journals, and if I want to avoid the big publishing houses then my list goes down from eight to two. It’s not as bad as it sounds though. The only one of those journals that I’ve actually submitted to is Combinatorics, Probability and Computing, and the only one of the first six that I’d feel sad about boycotting is Combinatorica, though I also feel quite positive about Random Structures and Algorithms.

So if anything is to be done about outrageously high journal prices in combinatorics, it looks as though new journals, or migration of existing ones, will be needed. (Incidentally, I’m writing all this on the assumption that we stick with something close to the current system of journals providing varying stamps of quality. Obviously other systems are possible, but persuading large numbers of mathematicians to move to those systems would be much more of a challenge.)

Are there two kinds of mathematician?

I was quite surprised that the reaction to the idea of a boycott was as positive as it was: I had expected a more divided response. I still wonder whether the true response is more divided. Could it be that the kind of mathematician who participates fully in online discussions on blogs, Mathoverflow etc. is naturally enthusiastic, whereas a more traditionally-minded mathematician just wants to be left alone to continue with a way of doing things that seems perfectly satisfactory? If so, then the apparently strong support could be misleading. I think it is this thought that makes me want to tread carefully after reading Michael Harris’s suggestion. But treading carefully doesn’t necessarily mean not treading at all. I’d be very interested to know what other people think about this: is there some moment that needs to be seized, or should we simply sit back and watch the number of signatures grow?


Quantum DiariesPhysicists Eat!

CERN is a pretty interesting place to work, probably more so than other physics laboratories around the world, due to its highly international nature. Here is a nice graphic of the nationalities of all CERN users:


In no place is the international nature of the laboratory more evident than in the main cafeteria on site. While most of the conversations are in English, you can usually hear bits of conversation in other languages. I personally like to play the ‘guess what language that table is speaking’ game, though it’s a little frustrating as I can’t just go over and ask to check if I have it right or not.

Whatever the language the conversation is in, you can be sure that the most discussed topic is physics. In fact, a lot of important discussions occur over a drink or a bite to eat. It’s just easier to discuss issues in an informal setting with less people than a more formal video conference.

Probably due to this fact, I think there is a slight fascination with the cafeteria from the media. Every couple of weeks there is usually a film crew in there, filming people eating and talking for whatever feature they are producing.

USLHC has decided to join in on the cafeteria action, having intern Amy Dusto set up LHC Lunch, a series of articles and videos sourced from lunch time interviews with members of the LHC experiments working for US institutes.

Why do I bring all of this up? Well, I was one of the physicists whom she interviewed, and my article and video has just been published. Check it out here. Enjoy!

John BaezThe Cost of Knowledge

As of this moment, 4760 scholars have joined a boycott of the publishing company Elsevier. Of these, only 20% are mathematicians. But since the boycott was started by a mathematician, 34 of us wrote and signed an official statement explaining the boycott:

The Cost of Knowledge.

It’s also below. Please check it out and join the boycott! I’m sure more than 34 mathematicians would be happy to sign, but we wanted to get the statement out soon.

THE COST OF KNOWLEDGE

This is an attempt to describe some of the background to the current boycott of Elsevier by many mathematicians (and other academics) at http://thecostofknowledge.com, and to present some of the issues that confront the boycott movement. Although the movement is anything but monolithic, we believe that the points we make
here will resonate with many of the signatories to the boycott.

The role of journals (1): dissemination of research.

The role of journals in professional mathematics has been under discussion for some time now.

Traditionally, while journals served several purposes, their primary purpose was the dissemination of research papers. The journal publishers were charging for the cost of typesetting (not a trivial matter in general before the advent of electronic typesetting, and particularly non-trivial for mathematics), the cost of physically publishing copies of the journals, and the cost of distributing the journals to subscribers (primarily academic libraries).

The editorial board of a journal is a group of professional
mathematicians. Their editorial work is undertaken as part of their scholarly duties, and so is paid for by their employer, typically a university. Thus, from the publisher’s viewpoint the editors are volunteers. (The editor in chief of a journal sometimes receives modest compensation from the publisher.) When a paper is submitted to the journal, by an author who is again typically a university-employed mathematician, the editors select the referee or referees for the paper, evaluate the referees’ reports, decide whether or not to accept the submission, and organize the submitted papers into volumes. These are passed on to the publisher, who then undertakes the job of actually publishing them. The publisher supplies some administrative assistance in handling the papers, as well as some copy-editing assistance, which is often quite minor but sometimes more substantial. The referees are again volunteers from the point of view of the publisher: as with editing, refereeing is regarded as part of the service component of a mathematician’s academic work. Authors are not paid by the publishers for their published papers, although they are usually asked to sign over the copyright to the publisher.

This system made sense when the publishing and dissemination of papers was a difficult and expensive undertaking. Publishers supplied a valuable service in this regard, for which they were paid by subscribers to the journals, which were mainly academic libraries. The academic institutions whose libraries subscribe to mathematics journals are broadly speaking the same institutions that employ the mathematicians who are writing for, refereeing for, and editing the journals. Therefore, the cost of the whole process of producing research papers is borne by these institutions (and the outside entities that partially fund them, such as the National Science Foundation in the United States): they pay for their academic mathematician employees to do research and to organize the publications of the results of their research in journals; and then (through their libraries) they pay the publishers to disseminate these results among all the world’s mathematicians. Since these institutions employ research faculty in order to foster research, it certainly used to make sense for them to pay for the dissemination of this research as well. After all, the sharing of scientific ideas and research results is unquestionably a key component for making progress in science.

Now, however, the world has changed in significant ways.
Authors typeset their own papers, using electronic typesetting. Publishing and distribution costs are not
as great as they once were. And most importantly,
dissemination of scientific ideas no longer takes place via the physical distribution of journal volumes. Rather, it takes place mainly electronically. While this means of dissemination is not free, it is much less expensive, and much of it happens quite independently of mathematical journals.

In conclusion, the cost of journal publishing has gone down
because the cost of typesetting has been shifted from
publishers to authors and the cost of publishing and distribution is significantly lower than it used to be.
By contrast, the amount of money being spent by university libraries on journals seems to be growing with no end in sight. Why do mathematicians contribute all this volunteer labor, and their employers pay all this money, for a service whose value no longer justifies its cost?

The role of journals (2): peer review and professional
evaluation

There are some important reasons that mathematicians haven’t just abandoned journal publishing. In particular, peer review plays an essential role in ensuring the correctness and readability of mathematical papers, and publishing papers in research journals is the main way of achieving professional recognition. Furthermore, not all journals count equally from this point of view: journals are (loosely) ranked, so that publications in top journals will often count more than publications in lower ranked ones. Professional mathematicians typically have a good sense of the relative prestige of the journals that publish papers in their area, and they will usually submit a paper to the highest ranked journal that they judge is likely to accept and publish it.

Because of this evaluative aspect of traditional journal publishing, the problem of switching to a different model
is much more difficult than it might appear at first. For
example, it is not easy just to begin a new journal (even an electronic one, which avoids the difficulties of printing and distribution), since mathematicians may not want to publish in it, preferring to submit to journals with known reputations. Secondly, although the reputation of various journals has been created through the efforts of the authors, referees, and editors who have worked (at no cost to the publishers) on it over the years, in many cases the name of the journal is owned by the publisher, making it difficult for the mathematical community to separate this valuable object that they have constructed from its present publisher.

The role of Elsevier

Elsevier, Springer, and a number of other commercial publishers (many of them large companies but less significant for their mathematics publishing, e.g., Wiley) all exploit our volunteer labor to extract very large profits from the academic community. They supply some value in the process, but nothing like enough to justify their prices.

Among these publishers, Elsevier may not be the most expensive, but in the light of other factors, such as scandals, lawsuits, lobbying, etc. (discussed further below), we consider them a good initial focus for our discontent. A boycott should be substantial enough to be meaningful, but not so broad that the choice of targets becomes controversial or the boycott becomes an unmanageable burden. Refusing to submit papers to all overpriced publishers is a reasonable further step, which some of us have taken, but the focus of this boycott is on Elsevier because of the widespread feeling among mathematicians that they are the worst offender.

Let us begin with the issue of journal costs. Unfortunately, it is difficult to make cost comparisons: journals differ greatly in quality, in number of pages per volume, and even in amount of text per page. As measured by list prices, Elsevier mathematics journals are amongst the most expensive. For instance, in the AMS mathematics journal price survey, seven of the ten most expensive journals (by 2007 volume list price) were published by Elsevier. (All prices are as of 2007 because both prices and page counts are easily available online.) However, that is primarily because Elsevier publishes the largest volumes. Price per page is a more meaningful measure that can be easily computed. By this standard, Elsevier is certainly not the worst publisher, but its prices do on the face of it look very high. The Annals of Mathematics, published by Princeton University Press, is one of the absolute top mathematics journals and quite affordably priced: $0.13/page as of 2007. By contrast, ten Elsevier journals (not including one that has since ceased publication) cost $1.30/page or more; they and three others cost more per page than any journal published by a university press or learned society. For comparison, three other top journals competing with the Annals are Acta Mathematica, published by the Institut Mittag Leffler for $0.65/page, Journal of the American Mathematical Society, published by the American Mathematical Society for $0.24/page, and Inventiones Mathematicae, published by Springer for $1.21/page. Note that none of Elsevier’s mathematics journals is generally considered comparable in quality to these journals.

However, there is an additional aspect which makes it hard to compute the true cost of mathematics journals. This is the widespread practice among large commercial publishers of “bundling” journals, which allows libraries to subscribe to large numbers of journals in order to avoid paying the exorbitant list prices for the ones they need. Although this means that the average price libraries pay per journal is less than the list prices might suggest, what really matters is the average price that they pay per journal (or page of journal) that they actually want, which is hard to assess, but clearly higher. We would very much like to be able to offer more concrete data regarding the actual costs to libraries of Elsevier journals compared with those of Springer or other publishers. Unfortunately, this is difficult, because publishers often make it a contractual requirement that their institutional customers should not disclose the financial details of their contracts. For example, Elsevier sued Washington State University to try to prevent release of this information. One common consequence of these arrangements, though, is that in many cases a library cannot actually save any money by cancelling a few Elsevier journals: at best the money can sometimes be diverted to pay for other Elsevier subscriptions.

One reason for focusing on Elsevier rather than, say, Springer is that Springer has had a rich and productive history with the mathematical community. As well as journals, it has published important series of textbooks, monographs, and lecture notes; one could perhaps regard the prices of its journals as a means of subsidizing these other, less profitable, types of publications. Although all these types of publications have become less important with the advent of the internet and the resulting electronic distribution of texts, the long and continuing presence of Springer in the mathematical world has resulted in a store of goodwill being built up in the mathematical community towards them. This store is being rapidly depleted, but has not yet reached zero. See for instance the recent petition to Springer by a number of French mathematicians and departments.

Elsevier does not have a comparable tradition of involvement in mathematics publishing. Many of the mathematics journals that it publishes have been acquired comparatively recently as it has bought up other, smaller publishers. Furthermore, in recent years it has been involved in various scandals regarding the scientific content, or lack thereof, of its journals. One in particular involved the journal Chaos, Solitons & Fractals, which, at the time the scandal broke in 2008–2009, was one of the highest impact factor mathematics journals that Elsevier published. (Elsevier currently reports the five-year impact factor of this journal at 1.729. For sake of comparison, Advances in Mathematics, also published by Elsevier, is reported as having a five-year impact factor of 1.575.) It turned out that the high impact factor was at least partly the result of the journal publishing many papers full of mutual citations. (See Arnold for more information on this and other troubling examples that show the limitations of bibliometric measures of scholarly quality.) Furthermore, Chaos, Solitons & Fractals published many papers that, in our professional judgement, have little or no scientific merit and should not have been published in any reputable journal.

In another notorious episode, this time in medicine, for at least five years Elsevier “published a series of sponsored article compilation publications, on behalf of pharmaceutical clients, that were made to look like journals and lacked the proper disclosures”, as noted by
com/wps/find/authored_newsitem.cws_home/companynews05_01203″>the CEO of Elsevier’s Health Sciences Division
.

Recently, Elsevier has lobbied for the Research Works Act, a proposed U.S. law that would undo the National Institutes of Health’s public access policy, which guarantees public access to published research papers based on NIH funding within twelve months of publication (to give publishers time to make a profit). Although most lobbying occurs behind closed doors, Elsevier’s vocal support of this act shows their opposition to a popular and effective open access policy.

These scandals, taken together with the bundling practices, exorbitant prices, and lobbying activities, suggest a publisher motivated purely by profit, with no genuine interest in or commitment to mathematical knowledge and the community of academic mathematicians that generates it. Of course, many Elsevier employees are reasonable people doing their best to contribute to scholarly publishing, and we bear them no ill will. However, the organization as a whole does not seem to have the interests of the mathematical community at heart.

The boycott

Not surprisingly, many mathematicians have in recent years lost patience with being involved in a system in which commercial publishers make profits based on the free labor of mathematicians and subscription fees from their institutions’ libraries, for a service that has become largely unnecessary. (See Scott Aaronson’s scathing but all-too-true satirical description of the publishers’ business model.) Among all the commercial publishers, the behavior of Elsevier seemed to many to be the most egregious, and a number of mathematicians had made personal commitments to avoid any involvement with Elsevier journals. (Some journals were also successfully moved from Elsevier to other publishers; e.g., Annales Scientifiques de l’école Normale Supérieure which until recent years was published by Elsevier, is now published by the Société Mathématique de France.)

One of us (Timothy Gowers) decided that it might be useful to
publicize his own personal boycott of Elsevier, thus encouraging others to do the same. This led to the current boycott movement at http://thecostofknowledge.com, the success of which has far exceeded his initial expectations.

Each participant in the boycott can choose which activities they intend to avoid: submitting to Elsevier journals, refereeing for them, and serving on editorial boards. Of course, submitting papers and editing journals are purely voluntary activities, but refereeing is a more subtle issue. The entire peer review system depends on the availability of suitable referees, and its success is one of the great traditions of science: refereeing is felt to be both a burden and an honor, and practically every member of the community willingly takes part in it. However, while we respect and value this tradition, many of us do not wish to see our labor used to support Elsevier’s business model.

What next?

As suggested at the very beginning, different participants in the boycott have different goals, both in the short and long term. Some people would like to see the journal system eliminated completely and replaced by something else more adapted to the internet and the possibilities of electronic distribution. Others see journals as continuing to play a role, but with commercial publishing being replaced by open access models. Still others imagine a more modest change, in which commercial publishers are replaced by non-profit entities such as professional societies (e.g., the American Mathematical Society, the London Mathematical Society, and the Société Mathématique de France, all of which already publish a number of journals) or university presses; in this way the value generated by the work of authors, referees, and editors would be returned to the academic and scientific community. These goals need not be mutually exclusive: the world of mathematics journals, like the world of mathematics itself, is large, and open access journals can coexist with traditional journals, as well as with other, more novel means of dissemination and evaluation.

What all the signatories do agree on is that Elsevier is an exemplar of everything that is wrong with the current system of commercial publication of mathematics journals, and we will no longer acquiesce to Elsevier’s harvesting of the value of our and our colleagues’ work.

What future do we envisage for all the papers that would
otherwise be published in Elsevier journals? There are many
other journals being published; perhaps they can pick up at
least some of the slack. Many successful new journals have been founded in recent years, too, including several that are electronic (thus completely eliminating printing and physical distribution costs), and no doubt more will follow. Finally, we hope that the mathematical community will be able to reclaim for itself some of the value that it has given to Elsevier’s journals by moving some of these journals (in name, if possible, and otherwise in spirit) from Elsevier to other publishers. One notable example is the August 10, 2006 resignation of the entire editorial board of the Elsevier journal Topology and their founding of the Journal of Topology, owned by the London Mathematical Society.

None of these changes will be easy; editing a journal is hard work, and founding a new journal, or moving and relaunching an existing journal, is even harder. But the alternative is to continue with the status quo, in which Elsevier harvests ever larger profits from the work of us and our colleagues, and this is both unsustainable and unacceptable.

Signed by:

Scott Aaronson
Massachusetts Institute of Technology

Douglas N. Arnold
University of Minnesota

Artur Avila
IMPA and Institut de Mathématiques de Jussieu

John Baez
University of California, Riverside

Folkmar Bornemann
Technische Universität München

Danny Calegari
Caltech/Cambridge University

Henry Cohn
Microsoft Research New England

Jordan Ellenberg
University of Wisconsin, Madison

Matthew Emerton
University of Chicago

Marie Farge
École Normale Supérieure Paris

David Gabai
Princeton University

Timothy Gowers
Cambridge University

Ben Green
Cambridge University

Martin Grötschel
Technische Universität Berlin

Michael Harris
Université Paris-Diderot Paris 7

Frédéric Hélein
Institut de Mathéatiques de Jussieu

Rob Kirby
University of California, Berkeley

Vincent Lafforgue
CNRS and Université d’Orléans

Gregory F. Lawler
University of Chicago

Randall J. LeVeque
University of Washington

László Lovász
Eötvös Lor´nd University

Peter J. Olver
University of Minnesota

Olof Sisask
Queen Mary, University of London

Terence Tao
University of California, Los Angeles

Richard Taylor
Institute for Advanced Study

Bernard Teissier
Institut de Mathématiques de Jussieu

Burt Totaro
Cambridge University

Lloyd N. Trefethen
Oxford University

Takashi Tsuboi
University of Tokyo

Marie-France Vigneras
Institut de Mathématiques de Jussieu

Wendelin Werner
Université Paris-Sud

Amie Wilkinson
University of Chicago

Günter M. Ziegler
Freie Universität Berlin

Appendix: recommendations for mathematicians.

All mathematicians must decide for themselves whether, or to what extent, they wish to participate in the boycott. Senior
mathematicians who have signed the boycott bear some
responsibility towards junior colleagues who are forgoing the
option of publishing in Elsevier journals, and should do their
best to help minimize any negative career consequences.

Whether or not you decide to join the boycott, there are some
simple actions that everyone can take, which seem to us to be
uncontroversial:

1) Make sure that the final versions of all your papers, particularly new ones, are freely available online— ideally both on the arXiv. (Elsevier’ electronic preprint policy is unacceptable, because it explicitly does not allow authors to update their papers on the arXiv to incorporate changes made during peer review). When signing copyright transfer forms, we recommend amending them (if necessary) to reserve the right to make the author’s final version of the text available free online from servers such as the arXiv, and on your home page.

2) If you are submitting a paper and there is a choice between an expensive journal and a cheap (or free) journal of the same standard, then always submit to the cheap one.

Note

The PDF version of this statement has many useful references not included here.


Geraint F. LewisMapping Growth and Gravity with Robust Redshift Space Distortions

A quick post this evening, as I have been at a workshop for the SAMI instrument, and am off to Santa Barbara for the First Galaxies and Faint Dwarfs: First Galaxies and Faint Dwarfs conference next week, but a couple of things to post. The first is a paper by my ex-phd student, Juliana Kwan, who is now a postdoc in the US at the Argonne National Laboratory.

The paper is quite complex, and focuses on redshift space distortions. This can be difficult to understand, but here goes. We've mentioned a couple of times that matter in the Universe is arranged on a cosmic web, with clusters, clumps, filaments and voids. In fact, it looks something like this:
Our Milky Way galaxy is just a little dot in there. But the detail of the way the mass is distributed is a probe of our Universe, as its present structure carries the imprint of the forces that created it, including the make up of the Universe, the cosmic evolution, and even the nature of gravity itself.

What do we see when we look out into the Universe? Well, we can measure the position to a galaxy on the sky very accurately, but distance is not. But we can easily measure the redshift, or the amount features in the spectrum are moving to longer wavelength, and use our cosmology to turn this into a distance using the famous Hubble law.

However, there is a problem. The redshift we see is a mixture of two parts, one due to the cosmic expansion (the Hubble law bit) and one due to the `peculiar velocity, or how much the galaxy is whizzing about. By comparing to the Microwave Background, we know that our Milky Way is moving with a speed of about 500 km/s.

As we measure redshifts, not distances, these peculiar velocities distort the distances we calculate via the Hubble law. So, this happens
The blue on the right is the actual positions of galaxies in the cosmic web (in a simulation of the Universe). The green on the left show the effects of peculiar velocity, and things are stretched and squished from the space position.

In fact, clusters of galaxies, where velocities are typically several thousands of km/s, get stretched out into what are known as Fingers of God - although what they have to do with the Higgs boson, I don't know (and no, that's not a serious statement). Here's a real set of observations;
Now, the details of these Redshift Space Distortions allow us learn even more information about the Universe, but it is very hard to untangle. What Juliana's paper does is to look at the possible ways that can be used to extract science, and shows what needs to be done if you want to get "robust" measures. I'll write more on on what robust means later, but for now, I'll finish by saying "Well done Juliana!"

Mapping Growth and Gravity with Robust Redshift Space Distortions

 Juliana Kwan, Geraint F. Lewis, Eric V. Linder
Redshift space distortions caused by galaxy peculiar velocities provide awindow onto the growth rate of large scale structure and a method for testinggeneral relativity. We investigate through a comparison of N-body simulationsto various extensions of perturbation theory beyond the linear regime, therobustness of cosmological parameter extraction, including the gravitationalgrowth index, \gamma. We find that the Kaiser formula and some perturbationtheory approaches bias the growth rate by 1-sigma or more relative to thefiducial at scales as large as k > 0.07 h/Mpc. This bias propagates toestimates of the gravitational growth index as well as \Omega_m and theequation of state parameter and presents a significant challenge to modellingredshift space distortions. We also determine an accurate fitting function fora combination of line of sight damping and higher order angular dependence thatallows robust modelling of the redshift space power spectrum to substantiallyhigher k.

John BaezThe Federal Research Public Access Act

As of this minute, 5030 scholars have joined the Elsevier boycott. You should too! But now is the time to go further and take positive steps to develop new, better systems for refereeing and distributing scholarly papers.

Everyone I know is talking about this now. Today, quantum physicist Steve Flammia pointed out to me that U.S. Representative Mike Doyle has a good idea:

The Federal Research Public Access Act.

It’s simple: we should get to see the research we paid for with our tax dollars. We shouldn’t have to pay for it twice: once to have it done, and once more to see the results.

As Doyle puts it:

Americans have the right to see the results of research funded with taxpayer dollars. Yet such research too often gets locked away behind a pay-wall, forcing those who want to learn from it to pay expensive subscription fees for access.

The Federal Research Public Access Act will encourage broader collaboration among scholars in the scientific community by permitting widespread dissemination of research findings. Promoting greater collaboration will inevitably lead to more innovative research outcomes and more effective solutions in the fields of biomedicine, energy, education, and health care.

But what does the bill actually do? It says this: any federal agency that spends more than $100 million per year funding research must make that research freely available in a public repository no later than 6 months after the research has been published in a peer-review journal.

This is already done by the National Institute of Health: the bill would expand this practice to the National Science Foundation, the Department of Energy, and other agencies.

What we should do

Someone with technical brains should make it easy for US citizens to contact Congress and support this bill. Google got 4.5 million people to sign their petition against SOPA, the so-called Stop Online Piracy Act. But we’ve been playing defense for too long. Let’s go on the offense and do something like this for a bill that’s good!

Emailing your congressperson incredibly easy, but telephone calls are even better, precisely because they’re a bit more work.

Here’s a sample of what you could write or say:

I am your constituent, and I urge you to support the Federal Research Public Access Act. As a taxpayer, I help support scientific research out of my own pocket. I deserve to see the results! The National Institute of Health already demands this for all the research they support, and the system works well. Broadening this policy will advance science and improve the lives and welfare of all Americans.

I believe an emphasis on ‘taxpayers getting their money worth’ and ‘improving the lives of all Americans’ may resonate well with the U.S. Congress: that’s why I’ve worded the message this way. Taxes and patriotism are hot-button issues. But of course you should feel free to modify this text!

Why it’s important

I think this bill is important: even if it doesn’t pass, it changes the debate and puts the publishers on the defensive.

Remember: the Association of American Publishers is still supporting the Research Works Act, a bill that would prevent federal agencies from requiring that the research they fund be made freely available online. It seems this bill would even roll back the existing requirement that research funded by the National Institute of Health be made freely available at PubMed Central!

There’s a built-in imbalance at work here. Publishers pays lobbyists to work full-time on advancing their agenda. Scientists and other scholars prefer to spend their time thinking about more interesting things. So, we’re usually reactive: we wait until something becomes intolerable before taking action. That’s why we’re fighting against a crisis of journal prices that bankrupt our libraries, and battling bad bills like the Research Works Act, when we should be developing better systems for communicating the results of our research, and supporting good bills…

… like the Federal Research Public Access Act!

For more

For more, see:

• David Dobbs, Open science revolt occupies Congress, Wired, 9 February 2012.

Call to action: Tell Congress you support the Bipartisan Federal Research Public Access Act (FRPAA), Alliance for Taxpayer Access, 9 February 2012.

• Scholarly Publishing & Academic Resources Council, SPARC FAQ for university administrators and faculty: Federal Research Public Access Act (FRPAA).

The original sponsors of the Federal Research Public Access Act were Reps. Kevin Yoder (R-KS) and Wm. Lacy Clay (D-MO). Identical legislation is also being introduced in the U.S. Senate by Sens. John Cornyn (R-TX), Ron Wyden (D-OR), and Kay Bailey Hutchison (R-TX).

 


Jordan EllenbergBlackboard panorama

Thanks to Emmanuel Kowalski for this action shot.  As you can no doubt tell from my happy expression in the photo, I am about to say something about mapping class groups.

 


Matt StrasslerThis Week’s Step Forward in the Search for the Higgs Particle

I’ve been busy with some pressing work in service of the triggering strategy at the Large Hadron Collider [LHC] experiments for the last few days… (and if you understand what the trigger does, you know that stuff having to do with triggering pretty much takes priority over almost anything else, including sleep.) So my apologies that I’ve been a little slow to sum up this week’s updated results on the search for the Higgs particle.   Today I hope to make amends.

In Tuesday’s post I reported that the ATLAS and CMS experiments at the LHC had updated their preliminary results on the Higgs search presented on December 13th, through the release of documents intended for publication [so-called ``preprints,'' intended for submission to a journal for peer review.]  In updates to that post, I highlighted two issues which I found particularly interesting in comparing the updated information to the presentations in December. The first of these represents additional evidence from CMS, which strengthens their case for a signal of a Higgs-like particle with a mass around 124 GeV/c2. The second of these involves the lack of any improvement in the concordance between ATLAS and CMS, which one might have hoped for, and in whose absence the results still remain almost as inconclusive as they were back in December. Today I want to explain these in a bit more detail.

The New Result From CMS

Fig. 1: The mechanisms by which a Standard Model Higgs can be produced, from largest to smallest. The largest (p p --> H, also known as g g --> H, where p stands for proton and g for a gluon inside the proton) produces a Higgs and (naively -- see the text!) nothing else except remnants of the protons. The next largest ( p p --> q q H, or q q --> q q H, where q stands for quark) produces a Higgs plus two quarks, each of which appears in the detector as a jet. The disturbances in the W and Z fields are often called "virtual particles", though they are not really particles at all.

CMS, like ATLAS, has been searching for Standard Model (or Standard-Model-like) Higgs particles decaying to two photons, independent of how they are produced.  [Recall that the ``Standard Model Higgs particle'' is the simplest form of Higgs particle that might be present in nature.]  But there are several ways to produce Standard Model Higgs particles (see Figure 1), the two largest of which are (1) g g –> H, two gluons (one from each proton `p’, so this is also called p p –> H) colliding to make a Higgs particle, and (2) q q –> q q H, two quarks scattering off each other and producing W and Z “virtual particles” (disturbances in the W and Z fields which aren’t particles at all), which meet in the middle and fuse to create a Higgs particle. The rate at which q q –> q q H occurs is about 10-15 times smaller than the rate for g g –> H. Now if there is a Standard Model Higgs particle at around 125 GeV/c2, the number of these Higgs particles already produced in ATLAS and CMS (each) and decaying to two photons is about 150 or so via g g –> H, and about 10 or so produced in q q –> q q H. Not all of these would be detected, so the numbers observed would be  somewhat smaller.

Let me start by making a simple and naive (and wrong) comment about these two production processes. The conclusion we’ll draw from the naive viewpoint will turn out to be correct, but the details will change, in an important way, when I correct these initial statements toward the end of this section.

From Figure 1, you can see that the thing which makes q q –> q q H different from the g g –> H process is that there are two outgoing quarks, which turn into two outgoing jets (sprays of hadrons). In other words, what the experiments will observe is collisions with two photons (from the decaying Higgs) along with two jets. So (naively) they should be able to look for the q q –> q q H process and measure it separately from the g g –> H process by separating the events with two photons into those that have two jets and those that don’t.

Fig. 2: CMS updated results, showing the number of events with two photons that have a particular invariant mass. Left: all collisions with two photons. Right: all collisions with two photons and two jets (subject to additional criteria; see text.) In red is the shape of the background; black dots are data, and the blue curves show the size and shape of an expected signal that is *twice* as big as expected for a Standard Model Higgs particle of mass 120 GeV.

Now why would that buy you anything? Because the backgrounds (processes that mimic a Higgs signal but have nothing to do with a Higgs at all) to these two classes of events are very different. In particular, although the signal for q q –> q q H is about 10 or so times smaller than that of g g –> H, the background is a lot smaller, by a factor of 100. Roughly speaking, in the g g –> H case one is looking for tens of events on a background of many hundreds, while in the q q –> q q H case one is looking for a few events on a background of a few.

This is clear from the CMS results, which are shown in Figure 2. On the left is the invariant mass of the two photons in all observed collisions with two photons.  That includes any generated either by g g –> H or q q –> q q H.  However, the majority of the events in any bin are due to background (collisions that generate photon pairs for reasons that have nothing to do with the Higgs particle. ) You can see that the size of any expected excess from a Standard Model Higgs (half the size of the blue bump at the bottom of the plot) is very small compared to the sheer number of events above the bump. That’s why you have to look really closely to see the small excess that CMS claims to observe, running from 121 to 126 GeV and centered around 123-124 GeV.  The background is hundreds of events per bin, and the expected signal is just a few dozen per bin.

On the right of Figure 2 is the same plot for those collisions that (roughly — see below) have two jets and two photons. You see that the background is about 2 events per bin, and the expected signal is one or two events per bin. The fact that the number of background events per bin is really small means that statistical fluctuations, bin to bin, are large; that’s why the data is all over the place on the right-hand plot. But you can see by eye that there are some extra events in the 123-124 GeV bin. Now you have to be a little careful; we don’t expect all of the signal events to be in one bin. They should be spread out over three or four, because the experimental measurement, though precise, still has its limitations. So you should look at the three or four bins around 124, and ask if you see extra events. And you do — more than you would expect from a Standard Model Higgs particle, in fact.

Now the statistical significance of the excess isn’t very large. But if the small excess seen on the left-hand plot of Figure 2 were a pure statistical fluctuation of the background, there would be no particular reason for the small fraction of these collisions that appear in the right-hand plot of Figure 2 to show an excess in the same place. On the other hand, that is exactly what you would expect if there were a Higgs particle of Standard Model type at this mass. So the observed excess in the right-hand plot makes the CMS case for a Higgs particle — which was a circumstantial case involving several different measurements with very little evidence on their own — somewhat stronger. Although the statistical significance of the combination of all their measurements doesn’t change much, the statistical significance of the part of the signal that I feel most comfortable trusting — the part that comes from the two-photon search and the four-lepton search — has certainly increased.

Fig. 3: Proton-proton collisions that make a Higgs via the collision of two gluons can (with lower probability) also produce one or more gluons along the way. These gluons turn into jets. The last process cannot be distinguished from the p p --> q q H process shown in Figure 1 (also known as q q --> q q H). Thus the right hand plot in Figure 2, if indeed it contains a Higgs signal, contains a poorly known admixture of q q --> q q H and g g --> g g H .

Ok: let me correct the naive aspects of what I’ve told you. Simply requiring two jets is not nearly enough to separate g g –> H from q q –> q q H, because there is a certain probability that a gluon or two will be produced in g g –> H.  In other words, there is a chance that rather than merely being g g –> H, the collision of two gluons will lead to g g –> g H or g g –> g g H.  (See Figure 3.)  Gluons, quarks and antiquarks all make jets, and gluon jets look a lot like quark jets and cannot typically be distinguished. So g g –> g g H  looks a lot like q q –> q q H. Now there are tricks to try to separate these processes to a degree (involving a requirement on where the jets and photons are actually heading relative to one another) but (a) these tricks only allow partial separation of q q –> q q H and g g –> g g H , and worse (b) the ability of theorists to calculate how well the tricks work in separating the two is very limited. The CMS experimenters assign a 70% systematic uncertainty to how much g g –> g g H remains after they play their tricks, but I can easily imagine this is a significant underestimate.  I would like to know more about how they came up with this number; if someone told me the systematic uncertainty was twice this big I would not flinch, given how intricate their tricks for selecting events are, and how little I would trust existing theoretical calculations in that context.  We need to hear from the world’s experts on these calculations, and see if they agree with each other that this 70% number is large enough.

If my concerns were valid, what that would mean is that once you play these intricate tricks (as was done for the plot on the right side of Figure 2) you would be left with a very uncertain theoretical prediction for exactly how many events a Standard Model Higgs particle would give you. The error bar on the theoretical prediction would probably be a factor of 2, possibly worse.  And if I were right, CMS might then be underestimating the Standard Model prediction, leading them to over-interpret their result in Figure 2 as additional evidence for an excess above Standard Model expectations.

Indeed there has been some excitement among some of my theoretical colleagues about this suggestion that the number of events observed by CMS is larger than would be expected from a Standard Model Higgs particle. This is because the number of events at ATLAS also seems to be larger than expected. But aside from the fact that the number of events in the excess seen in the right-hand plot is very low (and therefore subject to large fluctuations: if you expect 2 or 3 events, the statistical probability of getting 8 is not as small as you would think), the expectation is itself quite uncertain (and thus perhaps the 2 or 3 events expected should actually be 5 or 6.) And there’s another reason — see below — to be suspicious of any excess above the Standard Model at both ATLAS and CMS. So I would be very, very cautious about reading anything into the larger number of events.  Of course it is intriguing and fun to think about, but it is far too early to get excited.

In particular, to my excited colleagues: let’s recall that over the years the rate for the process g g –> H changed by a factor of 3 as theorists did more accurate calculations. What do we currently know about the process g g –> g g H, especially in this very limited kinematic region, given that it has been calculated only to leading order in perturbation theory?  Even if there *is* an excess above the expectations for a Standard Model Higgs in CMS’s data, we could perhaps just as well interpret it as an excess in g g –> H only, along with an underestimate of the contribution of g g –> g g H to this kinematic region, and with NO excess in q q –> q q H at all.

Why don’t CMS and ATLAS two-photon results line up better?

Now, let’s talk about the second issue: the fact that there was no improvement in the discrepancy in the preferred mass for ATLAS and CMS’s excesses (if interpreted as due to a new particle.)  The inconsistency isn’t so large as to make the measurements clearly contradictory, but neither is it small enough that one can ignore it.

Fig. 4: The two-photon data from 115 to 135 GeV, with a vertical line drawn in at 125 GeV to help the eye. Data are dots (statistical uncertainties are shown as vertical bars) and background is a red line. Left: CMS, all events (as in left of Figure 2); note the excess which runs from 121 to 126. Center: CMS, selected events with two jets (as in right of Figure 2); note the overall excess in the same region. Right: ATLAS; note the excess from 124 to 127.

First, let me bring your attention to the discrepancy; in Figure 4 I’ve pulled out the region from 115 to 135 GeV for the left and right plots in Figure 2, and put them next to an excerpt from ATLAS’s most recent plot covering the same region.  Note the line that I have drawn on all three plots dividing events above 125 GeV from events below 125 GeV.  What you notice is that ATLAS’s largest excess (126-127) is at a point where CMS has a deficit, and that ATLAS has a deficit across some of CMS’s excess.  I do not want to overstate the importance of this observation; it may easily be a consequence of small statistics. But neither should it be understated.  If anyone tells you that the case for the Higgs is firm, ask them about this discrepancy, which (when the case actually does become firm) had better go away.

There are three possible reasons for this discrepancy.

  1. There is no Higgs signal at all; ATLAS is seeing some fluctuations in its data, and separately, CMS is seeing some fluctuations in its data. By chance they happen to be close together, but the lack of consistency is reflective of the fact that they are actually independent effects that have nothing to do with each other.
  2. There is a signal, but the shape and location of the signal are distorted because either ATLAS’s or CMS’s signal peak is sitting on top of a large background fluctuation. For example, this could easily explain why ATLAS has a peak that is larger and narrower than expected from a pure Standard Model signal, an example of which (the dotted red line centered around 120 GeV) is shown on the ATLAS plot at right in Figure 4.  Were this true, by the way, then the excess at ATLAS would be a signal plus a background fluctuation, which combine to give an excess that seems too large for a Standard Model Higgs, but which will return toward the expected size as more data is gathered and the fluctuation in the background is smoothed away.
  3. There is a signal, and either ATLAS or CMS has its photon energy measurement wrong by 1-2%. Calibrating photon energies is not easy, and the mass measurement needs to be accurate to about 1%. I certainly thought it was possible that when the preliminary results of December turned into preprints we’d see a small shift in the mass measurement, up or down, at either CMS or ATLAS,  one that might move their two-photon measurements together (or apart) by as much as 1 GeV/c2 or even more. It appears we did not. The preprints show essentially no difference in the two-photon distributions compared to the Dec. 13th presentations. And this suggests that the experiments are reasonably confident in their energy measurements. That said, it appears that when the two-photon measurements are combined with the four-lepton measurements, the result for ATLAS and the result for CMS, whose preferred values were separated by about 2 GeV/c2, have now moved slightly further apart, by less than half a GeV/c2. [I think this is due to a shift in the four-lepton measurement of ATLAS, but I could be wrong; I haven't been able to fully track this down yet.]

I don’t know how to evaluate which of these possibilities is most likely, and even if I did, the truth doesn’t care what I (or you) think. We’ll just have to wait and see.

Summary of the Update

To sum up: the finalizing of the preliminary results from Dec. 13th has had the following effects (so far)

  • The CMS excess at 123-124 GeV/c2 in events with two photons and two jets means a moderate but notable improvement in the credibility of their case, which previously relied on combining a small excess in two photons with a very small excess in four other measurements. The case is more robust now. Meanwhile the preferred value of the mass has drifted just a tiny bit higher.
  • The ATLAS result is essentially unchanged, though the preferred value of the mass has drifted a bit higher.
  • The discrepancy between the preferred masses for the ATLAS and CMS measurements has slightly increased, not enough to cause a reevaluation of the situation, but eliminating any possibility that they might in the near term shift in such a way to become more consistent. Consistency (or clear inconsistency) will have to await a lot more data.
  • Both experiments appear to see excesses that are larger than would be expected for a Standard Model Higgs particle of this mass, by perhaps as much as a factor of 2, but given the large statistical uncertainties and (for the most recent CMS result) large systematic uncertainties, one should not be surprised if all such indications disappear over time.

The excesses in the measurements remain, within the uncertainties, consistent with a Standard Model Higgs, or something rather similar, of around 125 GeV/c2.  Due to the new CMS result, I am somewhat more optimistic that this is a real signal than I was in December, but we still have a long way to go before this all begins to settle down.


Filed under: Higgs, LHC News, Particle Physics

n-Category Café The Cost of Knowledge

As of this moment, 4760 scholars have joined a boycott of the publishing company Elsevier. Of these, only 20% are mathematicians. But since the boycott was started by a mathematician, 34 of us wrote and signed statement explaining the boycott. Here it is.

THE COST OF KNOWLEDGE

This is an attempt to describe some of the background to the current boycott of Elsevier by many mathematicians (and other academics) at http://thecostofknowledge.com, and to present some of the issues that confront the boycott movement. Although the movement is anything but monolithic, we believe that the points we make here will resonate with many of the signatories to the boycott.

The role of journals (1): dissemination of research.

The role of journals in professional mathematics has been under discussion for some time now.

Traditionally, while journals served several purposes, their primary purpose was the dissemination of research papers. The journal publishers were charging for the cost of typesetting (not a trivial matter in general before the advent of electronic typesetting, and particularly non-trivial for mathematics), the cost of physically publishing copies of the journals, and the cost of distributing the journals to subscribers (primarily academic libraries).

The editorial board of a journal is a group of professional mathematicians. Their editorial work is undertaken as part of their scholarly duties, and so is paid for by their employer, typically a university. Thus, from the publisher’s viewpoint the editors are volunteers. (The editor in chief of a journal sometimes receives modest compensation from the publisher.) When a paper is submitted to the journal, by an author who is again typically a university-employed mathematician, the editors select the referee or referees for the paper, evaluate the referees’ reports, decide whether or not to accept the submission, and organize the submitted papers into volumes. These are passed on to the publisher, who then undertakes the job of actually publishing them. The publisher supplies some administrative assistance in handling the papers, as well as some copy-editing assistance, which is often quite minor but sometimes more substantial. The referees are again volunteers from the point of view of the publisher: as with editing, refereeing is regarded as part of the service component of a mathematician’s academic work. Authors are not paid by the publishers for their published papers, although they are usually asked to sign over the copyright to the publisher.

This system made sense when the publishing and dissemination of papers was a difficult and expensive undertaking. Publishers supplied a valuable service in this regard, for which they were paid by subscribers to the journals, which were mainly academic libraries. The academic institutions whose libraries subscribe to mathematics journals are broadly speaking the same institutions that employ the mathematicians who are writing for, refereeing for, and editing the journals. Therefore, the cost of the whole process of producing research papers is borne by these institutions (and the outside entities that partially fund them, such as the National Science Foundation in the United States): they pay for their academic mathematician employees to do research and to organize the publications of the results of their research in journals; and then (through their libraries) they pay the publishers to disseminate these results among all the world’s mathematicians. Since these institutions employ research faculty in order to foster research, it certainly used to make sense for them to pay for the dissemination of this research as well. After all, the sharing of scientific ideas and research results is unquestionably a key component for making progress in science.

Now, however, the world has changed in significant ways. Authors typeset their own papers, using electronic typesetting. Publishing and distribution costs are not as great as they once were. And most importantly, dissemination of scientific ideas no longer takes place via the physical distribution of journal volumes. Rather, it takes place mainly electronically. While this means of dissemination is not free, it is much less expensive, and much of it happens quite independently of mathematical journals.

In conclusion, the cost of journal publishing has gone down because the cost of typesetting has been shifted from publishers to authors and the cost of publishing and distribution is significantly lower than it used to be. By contrast, the amount of money being spent by university libraries on journals seems to be growing with no end in sight. Why do mathematicians contribute all this volunteer labor, and their employers pay all this money, for a service whose value no longer justifies its cost?

The role of journals (2): peer review and professional evaluation

There are some important reasons that mathematicians haven’t just abandoned journal publishing. In particular, peer review plays an essential role in ensuring the correctness and readability of mathematical papers, and publishing papers in research journals is the main way of achieving professional recognition. Furthermore, not all journals count equally from this point of view: journals are (loosely) ranked, so that publications in top journals will often count more than publications in lower ranked ones. Professional mathematicians typically have a good sense of the relative prestige of the journals that publish papers in their area, and they will usually submit a paper to the highest ranked journal that they judge is likely to accept and publish it.

Because of this evaluative aspect of traditional journal publishing, the problem of switching to a different model is much more difficult than it might appear at first. For example, it is not easy just to begin a new journal (even an electronic one, which avoids the difficulties of printing and distribution), since mathematicians may not want to publish in it, preferring to submit to journals with known reputations. Secondly, although the reputation of various journals has been created through the efforts of the authors, referees, and editors who have worked (at no cost to the publishers) on it over the years, in many cases the name of the journal is owned by the publisher, making it difficult for the mathematical community to separate this valuable object that they have constructed from its present publisher.

The role of Elsevier

Elsevier, Springer, and a number of other commercial publishers (many of them large companies but less significant for their mathematics publishing, e.g., Wiley) all exploit our volunteer labor to extract very large profits from the academic community. They supply some value in the process, but nothing like enough to justify their prices.

Among these publishers, Elsevier may not be the most expensive, but in the light of other factors, such as scandals, lawsuits, lobbying, etc. (discussed further below), we consider them a good initial focus for our discontent. A boycott should be substantial enough to be meaningful, but not so broad that the choice of targets becomes controversial or the boycott becomes an unmanageable burden. Refusing to submit papers to all overpriced publishers is a reasonable further step, which some of us have taken, but the focus of this boycott is on Elsevier because of the widespread feeling among mathematicians that they are the worst offender.

Let us begin with the issue of journal costs. Unfortunately, it is difficult to make cost comparisons: journals differ greatly in quality, in number of pages per volume, and even in amount of text per page. As measured by list prices, Elsevier mathematics journals are amongst the most expensive. For instance, in the AMS mathematics journal price survey, seven of the ten most expensive journals (by 2007 volume list price) were published by Elsevier. (All prices are as of 2007 because both prices and page counts are easily available online.) However, that is primarily because Elsevier publishes the largest volumes. Price per page is a more meaningful measure that can be easily computed. By this standard, Elsevier is certainly not the worst publisher, but its prices do on the face of it look very high. The Annals of Mathematics, published by Princeton University Press, is one of the absolute top mathematics journals and quite affordably priced: $0.13/page as of 2007. By contrast, ten Elsevier journals (not including one that has since ceased publication) cost $1.30/page or more; they and three others cost more per page than any journal published by a university press or learned society. For comparison, three other top journals competing with the Annals are Acta Mathematica, published by the Institut Mittag Leffler for $0.65/page, Journal of the American Mathematical Society, published by the American Mathematical Society for $0.24/page, and Inventiones Mathematicae, published by Springer for $1.21/page. Note that none of Elsevier’s mathematics journals is generally considered comparable in quality to these journals.

However, there is an additional aspect which makes it hard to compute the true cost of mathematics journals. This is the widespread practice among large commercial publishers of “bundling” journals, which allows libraries to subscribe to large numbers of journals in order to avoid paying the exorbitant list prices for the ones they need. Although this means that the average price libraries pay per journal is less than the list prices might suggest, what really matters is the average price that they pay per journal (or page of journal) that they actually want, which is hard to assess, but clearly higher. We would very much like to be able to offer more concrete data regarding the actual costs to libraries of Elsevier journals compared with those of Springer or other publishers. Unfortunately, this is difficult, because publishers often make it a contractual requirement that their institutional customers should not disclose the financial details of their contracts. For example, Elsevier sued Washington State University to try to prevent release of this information. One common consequence of these arrangements, though, is that in many cases a library cannot actually save any money by cancelling a few Elsevier journals: at best the money can sometimes be diverted to pay for other Elsevier subscriptions.

One reason for focusing on Elsevier rather than, say, Springer is that Springer has had a rich and productive history with the mathematical community. As well as journals, it has published important series of textbooks, monographs, and lecture notes; one could perhaps regard the prices of its journals as a means of subsidizing these other, less profitable, types of publications. Although all these types of publications have become less important with the advent of the internet and the resulting electronic distribution of texts, the long and continuing presence of Springer in the mathematical world has resulted in a store of goodwill being built up in the mathematical community towards them. This store is being rapidly depleted, but has not yet reached zero. See for instance the recent petition to Springer by a number of French mathematicians and departments.

Elsevier does not have a comparable tradition of involvement in mathematics publishing. Many of the mathematics journals that it publishes have been acquired comparatively recently as it has bought up other, smaller publishers. Furthermore, in recent years it has been involved in various scandals regarding the scientific content, or lack thereof, of its journals. One in particular involved the journal Chaos, Solitons & Fractals, which, at the time the scandal broke in 2008–2009, was one of the highest impact factor mathematics journals that Elsevier published. (Elsevier currently reports the five-year impact factor of this journal at 1.729. For sake of comparison, Advances in Mathematics, also published by Elsevier, is reported as having a five-year impact factor of 1.575.) It turned out that the high impact factor was at least partly the result of the journal publishing many papers full of mutual citations. (See Arnold for more information on this and other troubling examples that show the limitations of bibliometric measures of scholarly quality.) Furthermore, Chaos, Solitons & Fractals published many papers that, in our professional judgement, have little or no scientific merit and should not have been published in any reputable journal.

In another notorious episode, this time in medicine, for at least five years Elsevier “published a series of sponsored article compilation publications, on behalf of pharmaceutical clients, that were made to look like journals and lacked the proper disclosures”, as noted by the CEO of Elsevier’s Health Sciences Division.

Recently, Elsevier has lobbied for the Research Works Act, a proposed U.S. law that would undo the National Institutes of Health’s public access policy, which guarantees public access to published research papers based on NIH funding within twelve months of publication (to give publishers time to make a profit). Although most lobbying occurs behind closed doors, Elsevier’s vocal support of this act shows their opposition to a popular and effective open access policy.

These scandals, taken together with the bundling practices, exorbitant prices, and lobbying activities, suggest a publisher motivated purely by profit, with no genuine interest in or commitment to mathematical knowledge and the community of academic mathematicians that generates it. Of course, many Elsevier employees are reasonable people doing their best to contribute to scholarly publishing, and we bear them no ill will. However, the organization as a whole does not seem to have the interests of the mathematical community at heart.

The boycott

Not surprisingly, many mathematicians have in recent years lost patience with being involved in a system in which commercial publishers make profits based on the free labor of mathematicians and subscription fees from their institutions’ libraries, for a service that has become largely unnecessary. (See Scott Aaronson’s scathing but all-too-true satirical description of the publishers’ business model.) Among all the commercial publishers, the behavior of Elsevier seemed to many to be the most egregious, and a number of mathematicians had made personal commitments to avoid any involvement with Elsevier journals. (Some journals were also successfully moved from Elsevier to other publishers; e.g., Annales Scientifiques de l’école Normale Supérieure which until recent years was published by Elsevier, is now published by the Société Mathématique de France.)

One of us (Timothy Gowers) decided that it might be useful to publicize his own personal boycott of Elsevier, thus encouraging others to do the same. This led to the current boycott movement at http://thecostofknowledge.com, the success of which has far exceeded his initial expectations.

Each participant in the boycott can choose which activities they intend to avoid: submitting to Elsevier journals, refereeing for them, and serving on editorial boards. Of course, submitting papers and editing journals are purely voluntary activities, but refereeing is a more subtle issue. The entire peer review system depends on the availability of suitable referees, and its success is one of the great traditions of science: refereeing is felt to be both a burden and an honor, and practically every member of the community willingly takes part in it. However, while we respect and value this tradition, many of us do not wish to see our labor used to support Elsevier’s business model.

What next?

As suggested at the very beginning, different participants in the boycott have different goals, both in the short and long term. Some people would like to see the journal system eliminated completely and replaced by something else more adapted to the internet and the possibilities of electronic distribution. Others see journals as continuing to play a role, but with commercial publishing being replaced by open access models. Still others imagine a more modest change, in which commercial publishers are replaced by non-profit entities such as professional societies (e.g., the American Mathematical Society, the London Mathematical Society, and the Société Mathématique de France, all of which already publish a number of journals) or university presses; in this way the value generated by the work of authors, referees, and editors would be returned to the academic and scientific community. These goals need not be mutually exclusive: the world of mathematics journals, like the world of mathematics itself, is large, and open access journals can coexist with traditional journals, as well as with other, more novel means of dissemination and evaluation.

What all the signatories do agree on is that Elsevier is an exemplar of everything that is wrong with the current system of commercial publication of mathematics journals, and we will no longer acquiesce to Elsevier’s harvesting of the value of our and our colleagues’ work.

What future do we envisage for all the papers that would otherwise be published in Elsevier journals? There are many other journals being published; perhaps they can pick up at least some of the slack. Many successful new journals have been founded in recent years, too, including several that are electronic (thus completely eliminating printing and physical distribution costs), and no doubt more will follow. Finally, we hope that the mathematical community will be able to reclaim for itself some of the value that it has given to Elsevier’s journals by moving some of these journals (in name, if possible, and otherwise in spirit) from Elsevier to other publishers. One notable example is the August 10, 2006 resignation of the entire editorial board of the Elsevier journal Topology and their founding of the Journal of Topology, owned by the London Mathematical Society.

None of these changes will be easy; editing a journal is hard work, and founding a new journal, or moving and relaunching an existing journal, is even harder. But the alternative is to continue with the status quo, in which Elsevier harvests ever larger profits from the work of us and our colleagues, and this is both unsustainable and unacceptable.

Signed by:

Scott Aaronson
Massachusetts Institute of Technology

Douglas N. Arnold
University of Minnesota

Artur Avila
IMPA and Institut de Mathématiques de Jussieu

John Baez
University of California, Riverside

Folkmar Bornemann
Technische Universität München

Danny Calegari
Caltech/Cambridge University

Henry Cohn
Microsoft Research New England

Jordan Ellenberg
University of Wisconsin, Madison

Matthew Emerton
University of Chicago

Marie Farge
École Normale Supérieure Paris

David Gabai
Princeton University

Timothy Gowers
Cambridge University

Ben Green
Cambridge University

Martin Grötschel
Technische Universität Berlin

Michael Harris
Université Paris-Diderot Paris 7

Frédéric Hélein
Institut de Mathéatiques de Jussieu

Rob Kirby
University of California, Berkeley

Vincent Lafforgue
CNRS and Université d’Orléans

Gregory F. Lawler
University of Chicago

Randall J. LeVeque
University of Washington

László Lovász
Eötvös Lor´nd University

Peter J. Olver
University of Minnesota

Olof Sisask
Queen Mary, University of London

Terence Tao
University of California, Los Angeles

Richard Taylor
Institute for Advanced Study

Bernard Teissier
Institut de Mathématiques de Jussieu

Burt Totaro
Cambridge University

Lloyd N. Trefethen
Oxford University

Takashi Tsuboi
University of Tokyo

Marie-France Vigneras
Institut de Mathématiques de Jussieu

Wendelin Werner
Université Paris-Sud

Amie Wilkinson
University of Chicago

Günter M. Ziegler
Freie Universität Berlin

Appendix: recommendations for mathematicians.

All mathematicians must decide for themselves whether, or to what extent, they wish to participate in the boycott. Senior mathematicians who have signed the boycott bear some responsibility towards junior colleagues who are forgoing the option of publishing in Elsevier journals, and should do their best to help minimize any negative career consequences.

Whether or not you decide to join the boycott, there are some simple actions that everyone can take, which seem to us to be uncontroversial:

  1. Make sure that the final versions of all your papers, particularly new ones, are freely available online – ideally both on the arXiv. (Elsevier’s electronic preprint policy is unacceptable, because it explicitly does not allow authors to update their papers on the arXiv to incorporate changes made during peer review). When signing copyright transfer forms, we recommend amending them (if necessary) to reserve the right to make the author’s final version of the text available free online from servers such as the arXiv, and on your home page.
  2. If you are submitting a paper and there is a choice between an expensive journal and a cheap (or free) journal of the same standard, then always submit to the cheap one.

Note

The PDF version of this statement has many useful references not included here.

February 09, 2012

Dave BaconA Federal Mandate for Open Science

Witness the birth of the Federal Research Public Access Act:

“The Federal Research Public Access Act will encourage broader collaboration among scholars in the scientific community by permitting widespread dissemination of research findings.  Promoting greater collaboration will inevitably lead to more innovative research outcomes and more effective solutions in the fields of biomedicine, energy, education, quantum information theory and health care.”

[Correction: it didn't really mention quantum information theory---SF.]

You can read the full text of FRPAA here.

The bill states that any federal agency which budgets more than $100 million per year for funding external research must make that research available in a public online repository for free download now later than 6 months after the research has been published in a peer-reviewed journal.

This looks to me like a big step in the right direction for open science. Of course, it’s still just a bill, and needs to successfully navigate the Straights of the Republican-controlled House, through the Labyrinth of Committees and the Forest of Filibuster, and run the Gauntlet of Presidential Vetos. How can you help it survive this harrowing journey? Write your senators and your congresscritter today, and tell them that you support FRPAA and open science!

Hat tip to Robin Blume-Kohout.

David Hoggexoplanets and speckles

Fergus did a set of demonstrations today for Oppenheimer, Brenner, and me of his planet-finding code for Oppenheimer's P1640 high dynamic-range imager. The imager blocks out most of the light of the star in an intermediate focal plane, but a combination of atmosphere and optical distortions plus physical optics means that still huge amounts of light hits the focal plane and in a very speckly pattern of blobs. Fergus showed us that he can (potentially) find planets among those speckles, even planets that are percent-level distortions of the speckle pattern! If this holds up it could have huge impact on high dynamic-range imaging, now and in the near future. For the past week or two I have also been playing around with modeling electromagnetic fields in imperfect cameras to see if we can make a more physically motivated model (Fergus's model is data-driven rather than physics-driven).

David HoggHST target selection

Tsalmantza and I discussed how we might winnow down our list of potentially lensed quasars into a set of sensible targets for HST imaging. It is essential to look for marginal evidence of extension; that is, do the quasars depart from our expectation of point-source morphology. A more speculative path is to look at luminosity indicators: Are any of the quasars brighter than you would expect given line strengths and ratios, possibly indicating gravitational magnification?

Secret Blogging SeminarA forum on mathematical publishing

There’s been lots of great discussion on the future of mathematical publishing in recent weeks, largely inspired by the boycott of Elsevier (1) (2) (3). Mostly this has been happening on blogs, particularly Tim Gower’s, but also here and a number of other places. There’s a nice index of this discussion in a wiki page on Michael Nielsen’s site, to the extent that it’s possible to index a discussion happening all over the internet!

I think a lot of people find it somewhat frustrating that this discussion is predominantly happening in blog comment threads, however. It’s hard to maintain conversations, and almost impossible to coordinate people with similar interests and concerns. Andrew Stacey and I thought that it might be helpful to set up a forum (like the nForum, associated the to nCafe, or meta.mathoverflow.net) to alleviate this.

Thus, please check out Math 2.0! We’ll see what sticks. :-)

Our hope is that this might provide a better home for more focused discussion, and a place for people who want to coordinate concrete next steps in reforming mathematical publishing. Come in and join us!


Clifford JohnsonIncomplete Subtractions

Well, it has been well over two months since I popped into the studio I sometimes visit to to a "drop in and draw" session. (I've spoken about the value of such practice here before.) Although I've been drawing a bit here and there on the bus and subway to keep practicing, and also doing some work on some pages of The Project (actually, some pretty detailed finish work on a few pages I'm quite happy with), I was not sure whether I'd have the right chops to do a good job at the session, and expected that if I went I'd have a frustrating -but of course valuable- evening of knocking off some rust and oiling the wheels again. So I went along yesterday. Strangely, it felt like it was going to be a good session as I approached, and as I settled down and began to try to capture the 2 minute poses, and then the 5 minute poses, I felt like I was flowing along pretty well. It helped that the model on duty is [...]

n-Category Café The Moduli 3-Stack of the C-Field

We are in the process of finalizing a little article

Domenico Fiorenza, Hisham Sati, U.S., The E 8 moduli 3-stack of the C-field in M-theory

Abstract The higher gauge field in 11-dimensional supergravity – the C-field – is constrained by quantum effects to be a cocycle in some twisted version of ordinary differential cohomology. We argue that it should indeed be a cocycle in a certain twisted nonabelian differential cohomology. We give a simple and natural characterization of the full smooth moduli 3-stack of configurations of the C-field, the field of gravity and the (auxiliary) E8-Yang-Mills field. We show that the truncation of this moduli 3-stack to a bare 1-groupoid of field configurations reproduces the differential integral Wu structures that Hopkins-Singer had shown (HS02) to formalize Witten’s argument (Wi96) on the nature of the C-field. Finally we give a similarly simple and natural characterization of the moduli 2-stack of boundary C-field configurations and show that it is equivalent to the smooth moduli 2-stack of anomaly free heterotic supergravity field configurations (SSS12).

This may be read as a companion to the article that I mentioned last time, at Multiple M5-branes, String 2-connections, and 7d nonabelian Chern-Simons theory

A pdf of the article is behind the above link. Any comment you might have would be most welcome.

Tommaso DorigoTop Quark Production Studied In Detail

A new result by the CMS collaboration has been produced today on top quark physics. For those of you who only get triggered by the search of new particles or new forces, the study of "yesterday's signals", such as top quarks, is boring and uninformative; but high-energy physics is a rich field of research, and we extend our understanding of subnuclear physics no less by getting to know how exactly top quarks get produced in proton-proton collisions, than we do by placing limits on ephemeral particles (SUSY ones, e.g.).

So I salute the new measurement as an important advance. Using over one inverse femtobarn of data collected in 2011 (about a hundred trillion proton-proton collisions), CMS was able to study top quark pairs in great detail.

read more

Scott AaronsonThe battle against Elsevier gains momentum

Check out this statement on “The Cost of Knowledge” released today, which (besides your humble blogger) has been signed by Ingrid Daubechies (President of the International Mathematical Union), Timothy Gowers, Terence Tao, László Lovász, and 29 others.  The statement carefully explains the rationale for the current Elsevier boycott, and answers common questions like “why single out Elsevier?” and “what comes next?”

Also check out Timothy Gowers’ blog post announcing the statement.  The post includes a hilarious report by investment firm Exane Paribas, explaining that the current boycott has caused Reed Elsevier’s stock price to fall, but presenting that as a great investment opportunity, since they fully expect the price to rebound once this boycott fails like all the previous ones.  I ask you: does that not want to make you boycott Elsevier, for no other reason than to see the people who follow Exane Paribas’ cynical advice lose their money?

In related news, the boycott petition now has 4600+ signatures and counting.  If you’ve already signed, great!  If you haven’t, why not?

Update (Feb. 9): There’s now a great editorial by Gareth Cook in the Boston Globe supporting the Elsevier boycott (and analogizing it to both the Tahrir Square uprising and the Boston Tea Party!).

BackreactionWhen I grow up I want to be a physicist

The other day I talked to a young women who is about to finish high school, so the time is coming to decide what education to pursue after that. What does a theoretical physicist actually do?, she asked. And while I was babbling away, I recalled how little I knew myself what a physicist does when I was a young student.

Of course I knew that professors give lectures. And I had read a bunch of popular science books and biographies, from which I concluded that theoretical physics requires a lot of thinking. The physicists I had read about, they also wrote many books, and articles and, most of all, letters. They really wrote a lot of letters, these people. There also was the occasional mentioning of a conference, where talks had to be given. And I could have learned from these historical narratives that, even back then, the physicists moved a lot, but I blamed that on one or the other war. I never asked who organized these conferences or hired these people.

While one could say that my family is scientifically minded, when I grew up I didn't know anybody who worked in scientific research or in academia who I could have asked what their daily life looks like. Today, it is easier for young people with an interest in science to find out what a profession entails in practice, and if you are thinking about a career in science, I really encourage you too look around. Piled higher and deeper has documented the sufferings of PhD students as humorously as aptly, and postdocs from many areas of science write blogs. When I finished high school, I didn't even know what a postdoc is! At the higher career levels, bloggers are still sparse, but they are there, and they tell you what theoretical physicists do.

Yes, they give lectures. They also give seminars, and attend seminars. They write articles and read articles, and review articles. They also write the occasional book, though that isn't very common in the early career stages. They attend conferences and workshops, and also organize conferences and workshops. They travel a lot. They sit in committees for all sorts of organizational and administrational purposes.

To some extend, the books I had read contained a little of all of that. What they did not tell me anything about was one thing that theoretical physicists today spend a lot of time on: writing proposals. They write and write and write proposals, to fund their own research or their research group, their students and postdocs, or their conferences, or maybe just their own book, or long-term stays. If you want to be a theoretical physicist, you better get used to the idea that a big part of your job will consist of asking for money, again and again and again. And then, somebody also has to review these proposals...

You will not be surprised to hear that theoretical physicists do no longer write a lot of letters. I don't know how their email frequency compares to that of the general population, but this touches on one aspect of research in theoretical physics that you read about very, very little on blogs. That is how tightly knit the community really is, and how much people talk to each other and exchange ideas.

At least on the blogs that I read, it's like an unwritten code. You don't blog about conversations with your peers, except possibly under special circumstances (like for an interview). Most of these conversations are considered private and sharing inappropriate, even if confidentiality was not explicitly asked for. I think this is good because there needs to be room for privacy. However, this might give the reader a somewhat distorted picture of what research looks like. It is really a lot about exchanging ideas, it is a lot about asking questions, and about building up on other people's argument. A lot of research is communication with colleagues. So, if you try to catch a taste of theoretical physics from reading blogs, keep in mind that most bloggers will not pull their nonblogging colleagues into a public discussion.

Oh, yes, and in the remaining time - the time not spent on reading papers, sitting in seminars, organizing conferences or writing proposals or reports or blogging - in that time, they think.

If you are considering to become a scientist: Check out this wonderful tumblr site that shows you some photos of real scientists!

n-Category Café Young Researchers Workshop on Higher Algebraic and Geometric Structures: Modern Methods in Representation Theory

This past October Alistair Savage and I organized a workshop on Category Theoretic Methods in Representation Theory at the University of Ottawa. The event was generously supported by the Fields Institute.

Following the success of the October workshop, Oded Yacobi, Chris Dodd, and I decided to hold another workshop, this time with a focus on researchers who are still very early on in their careers. The Fields Institute has again offered funding and this time will host the event as well.

We would like to draw your attention to the upcoming Young Researchers Workshop on Higher Algebraic and Geometric Structures: Modern Methods in Representation Theory to be held May 7-9, 2012 at the Fields Institute in Toronto.

Keep reading below the fold and see the workshop website for more on the content of the workshop, registration information, and applications for financial support.

This three day workshop will bring together young (mainly postdoctoral and graduate student) researchers in representation theory and related fields with a focus on exciting new developments in algebraic and geometric methods.

The meeting will incorporate both a graduate student workshop and a research level conference. The workshop component will consist of a series of three lectures by Ben Webster (Northeastern) explaining S-duality. In addition to these lectures, there will be research talks given by the leading young researchers in representation theory and related fields.

The goals of this workshop are:

  • to highlight algebraic-geometric and categorical methods in representation theory which have made important contributions in recent years,
  • to learn about the program on S-duality initiated by Braden, Licata, Proudfoot, and Webster,
  • to provide an opportunity for young researchers in the field to learn of and present cutting-edge research, and to foster interaction between Canadian and American researchers working in this area.

There will be approximately 12 talks, each 50 minutes in length, spread out over Monday, Tuesday and Wednesday, with ample time in between talks for discussion. Three of these talks will be reserved for Ben Webster’s lectures.

Here is the current list of speakers:

  • Ben Cooper (University of Virginia)
  • Ben Elias (MIT)
  • David Jordan (UT Austin)
  • Carl Mautner (Harvard)
  • Weiwei Pan (Göttingen) *
  • Jaimie Thind (University of Toronto)
  • Peter Tingley (MIT) *
  • Ben Webster (Northeastern)
  • Xinwen Zhu (Harvard) *

(* tentative)

We will be able to provide limited support for some graduate student and postdoc participants. Registration and applications for support will be open soon at the website. Please don’t hesitate to contact one of us if you have any questions.

We hope to see you there!

Secret Blogging SeminarSome thoughts on teaching Michigan calculus

I just finished teaching two sections of first semester calculus at the University of Michigan. Michigan calculus is somewhat famous — it is very focused on conceptual and graphical understanding, spends a lot of time on “real world” data, and achieves very high scores in national measures of teaching effectiveness. Moreover, while the course coordinators are highly experienced professionals, almost all of the day-to-day instruction is done by a small army of grad students and postdocs; I was one of the very few tenured or tenure track people teaching calculus this term. I was very curious to see how this was made to work.

My intended audience here is others who are about to teach Calculus at Michigan, or people who are wondering what it would be like to set up a Michigan-style program in their own departments. A word to any of my students who find this post. Feel free to read, and feel free to call me David rather than Professor Speyer. But please don’t blast this all over your Facebook pages. I’ve said nothing that I would be unwilling to defend in public, but I’d rather that all my students not find this next time I teach the course. Let’s just keep this little moment of sharing between me and those of you who stumble upon it.

Self congratulation

My students’ median grade was a B-, almost exactly the same as the course median. Of course, I am a highly competitive person, and I wanted my sections to blow the others out of the water. But I’m going to repeat the same thing I told many of my students: Michigan is a very competitive school and everyone who gets in here is excellent. A median grade at Michigan should be a point of pride. In my case, I was competing with grad students who had taken calculus far more recently than I had; had taught it several times before; and who were often extraordinary competitors with a string of Olympiad medals and Putnam victories. Landing in the middle of that pack is an accomplishment, and I have decided to be proud of it.

I also had a lot of fun. It was more work than I expected, and it didn’t give me the same sort of stimulation that teaching an advanced research course does, but I found it really interesting thinking about how to present some of the oldest accomplishments of mathematical thought to my students. If you are at Michigan and are worried that you will be bored stiff teaching this course, I can tell you that I wasn’t.

Organization

I was assigned two sections of 32 students each, which dwindled to 30 each over the course of the term. I did the day-to-day teaching (3 times a week, 80 minutes per meeting). I also wrote and graded weekly quizzes, and graded the team homework assignments. The rest of the students’ homework, the basic decisions as to what I should cover, and the exams, were written by a small team of experienced instructors, known as the course coordinators. The coordinators also held weekly meetings to brief us on where the course was going.

The course coordinators were really hardworking and insightful people. I applaud Michigan for recognizing that it is worth paying full time staff to do this job, and finding ones who did it so well.

The course website (including a secret section), was full of resources to help me plan my lessons. In Spring 2011 I often was unable to sleep for fear that I had misplanned my Hodge Theory lectures, so it was a major relief to be so well taken care of. I drifted from the given schedule by a day at times, but I basically followed it and found that it worked well.

I don’t think that this course could possibly work as well without the extensive guidance that we instructors received. If you are thinking of importing MI calculus into your school, you should think about how to take care of your teachers.

The applied and conceptual focus of the class

Calculus at Michigan focuses very heavily on working with data and on understanding what computation to do, rather than how to do it. It also focuses on getting students to be able to explain what they are doing to people with even less mathematical background than they have. I think these are very appropriate goals in theory. I feel that the former was achieved fairly well in practice, though I have some complaints, but the latter was not.

I enjoyed the real world examples. My students seemed to really get excited when I brought in data about a subject which interested them. The athletes really got a kick out of table 1 from this paper, showing how world class runners accelerate second by second; a lot of students enjoyed figuring out how long a runway a Ford Mustang needs to accelerate to 120 mph using the data here.

What was difficult about this was that it made grading very difficult to predict. We would ask questions asking people to give a “practical explanation” of such and such, or to read some data off a graph with no grid lines, and then had very specific grading policies as to what we would accept. It was hard to prepare the students for this, and the best way to do so was to have them memorize certain formulations of “practical explanations”, which were very far from any actual understanding.

The goal of teaching students to write well is a great one, but I don’t think I had the resources to actually do so, and I certainly didn’t succeed. Mathematical writing is a specialized skill. I don’t see how I could have taught it without spending far more class time on it than I had. I wound up grading the writing on the team homework much more leniently than I was told to, because I didn’t feel that I could take off points for expository errors I didn’t have time to explain.

The exams

You can see the exams here: Midterm 1 (PDF), Midterm 2 (PDF), Final. The exams are the major determiner of the course grade, and one feels a very strong pressure to focus one’s teaching on what the exams will cover. I have no problem with that, because I think the exams cover the right topics. If you don’t like the idea of “teaching to a test”, you might dislike teaching at Michigan.

The exams, especially the last two, were almost all challenging conceptual problems. It is strange for someone like me to say this, but I don’t think I approve of this. I would be happier if the exams had one or two more basic questions, with the rest staying as they are.

Consider a student who has learned the basic mechanical skills of numerically approximating a derivative from data, or of differentiating some complex expression. But he doesn’t understand what this really means or when to do it. This student has not learned nearly as much as we want him to. He should do poorly on the exam — a C minus or a D.

But he also has mastered a nontrivial skill. I think we should give him a question on the exam where he can display this mastery. The course’s approach is, instead, to give all difficult questions and then set a curve which will bring this student’s stammerings up to the C minus or D range.

Partly, I think this is important for maintaining student motivation. There are very few people who will enjoy an activity at which they regularly have no small successes. (I am curious — the median score on the Putnam exam is a zero. How many of those competitors come back the next year? If some of my readers were among those who did, can you tell me how you stayed motivated?) I feel that, psychologically, there is a big difference between having one or two questions that my hypothetical student can solidly get right, and having him pick up a smattering of partial credit on each of nine or ten questions that are too hard for him.

Also, I think that it is worth being able to detect the difference between the student I describe here, and one who has not even mastered those skills.

Team homework

My students were organized into groups of four who met weekly to work together on more challenging problems, which they wrote up as a group and received a single collective grade for. I was very concerned about this going into the course. It worked better than I expected, but I still have some concerns.

First of all, a note to any students in Michigan calculus. I highly encourage you to work hard on your team homework, and to talk in depth with your teammates about the problems. Over and over, I saw students who really committed to their teams get great benefits out of it. This happened even if none of the students on a team was strong — I had one team made up of four good friends, all of whom were in the bottom quarter of the class. Once they started working together, their performance on quizzes and in class improved dramatically.

So why the misgivings? Because, if I were a student in this class, I would have highly resented being forced to work with students not at my level. I spent a lot of time in high school and college helping my fellow students, and I learned a lot by doing so. But I wouldn’t have appreciated being forced to do it. I remember the first time — sometime in my Freshman spring — when I told my classmates “Look, it’s 3 AM and I finished this problem set yesterday. You’re welcome to keep looking at my notes, but I’m going to bed.” That was really hard. I felt guilty and awkward about it. If my grade had depended on staying awake until they understood the material, I can’t imagine how much harder that would have been.

I saw some hints of this kind of unpleasant dynamic in the teams, and I suspect there was a lot more trouble that I didn’t see. It wasn’t just the strong students who were stressed, either: I had two students who, when I asked them for teammate preferences, specifically asked not to be put with students in the top half of the class.

Working with people, no matter what, will be a source of tension. Working with people whom you don’t choose to work with, and being dependent on them for your grade — that may be how the real world works, and I see the benefits of it, but I don’t like it in my classroom.

The mathlab

Michigan has a large room called the mathlab (it could probably hold 200 people or so) filled with round tables and staffed 8 hours a day by tutors who can provide help with mathematics courses. It is open without appointment to anyone who wants to come ask a question. If you are taking an intro calculus course, you’ll find lots of help. If you are taking a complex analysis course, then helpers might be rarer — but sometimes you’ll find someone like me at the table, glad to see what I can do.

I think this is a great idea, and every university should set this up.

I had a lot of fun going to the mathlab. I expect that next semester, when I’m not officially teaching, I might drop in anyway to take a break. The mathlab is like a live-action version of math.SE and, while the questions are less exciting, the challenge of being face to face with the questioner makes up for it.

Some things I wish I’d known/would do differently

\bullet My more experienced colleagues told me I didn’t need to spend class time teaching mechanical differentiation skills at all. Just emphasize how crucial it was to pass the Gateway (a computer administered exam testing this skill), and the students would teach themselves. I didn’t quite follow this advice; on one Friday, I dismissed class early for those who had passed the Gateway already and spent 40 minutes drilling differentiation for the remainder. But 2/3‘s of the class had already passed the Gateway by themselves before I did this, and all but one of my students eventually did. So this advice really did turn out to work, despite my skepticism.

\bullet One of my fellow instructors offered to cook breakfast for his class if all of them passed the Gateway a week before the deadline. He wound up paying off and, not only did this make him hugely popular, but it probably relieved him of a lot of the stress I felt in the final days of the Gateway period. When I teach this class again, I will make a similar deal.

\bullet It was really hard to remember how bad my students are at systematic computation. For example, when counting boxes under a curve, they pointed at the squares in a random order, rather than sweeping from left to right, top to bottom. Every time they copied an expression from one part of the page to another, there is a high probability that a plus sign will change to a minus. If they multiplied (a+b+c)(d+e), the terms will not appear in lexicographic order.

This has two consequences. (1) Except when I want to test/improve this skill, I should not assign problems that involve more than a few lines of computation. I missed this over and over. (2) I should think about how to directly teach this skill.

\bullet The course maintains a problem bank of suggested questions to use on quizzes and in class. There are a lot of trick questions in there! For example, one question asked students to numerically approximate \frac{d}{dx} \sin(\sqrt{x}) at x=0. Another geometric optimization problem asked students to construct a solid shaped like a cone on top of a cylinder to optimize some quantity, and the optimum was the degenerate case where the cone had height zero! Since there are no solutions provided, you have to really be on guard not to assign one of these without catching it.

\bullet The webhomework system assumes that (1) any graph which looks piecewise linear actually is piecewise linear and (2) the students will use that fact to make any computations with that graph precise. In my own nitpicky way, I don’t think that’s fair. I very much doubt you can visually perceive the difference between a straight line and two lines making an angle of 1^{\circ} with each other and this can make a substantial difference in an integral. This paradox demonstrates the point very vividly.

From a more practical perspective, almost none of my students naturally grasped this idea. I should have taught it explicitly.

\bullet There are a lot of problems in the webhomework (mainly sections 5.2-5.4) which can only be reasonably done by typing integrals into a calculator. (The intended focus of the problem is on how to set up the correct integral in the first place.) I didn’t realize this was the intended method at first and spent a lot of time teaching students how to approximate them by Riemann sums or how to compute them exactly, while thinking to myself that this was way too hard for these students.

\bullet My students really got settled into their teams and didn’t like spending class time working with other students. (I guess this shouldn’t have surprised me.) However, when I did force them to, they often got a lot out of it. Next time, I should make sure that they spend a lot more time interacting outside their teams from the start of class.

\bullet The course coordinators really stressed how much the course focused on groupwork. I think I took these messages too strongly to heart. Especially in the first half of the term, I tried to spend almost the entire class period working with small groups. I think this is a reasonable interpretation of what I was being told, but it apparently wasn’t what they meant.


February 08, 2012

Jordan EllenbergStatement on the Elsevier boycott

A group of mathematicians, myself included, have prepared and signed a statement which attempts to summarize the reasons so many mathematicians and other scientists (nearly 5000 at this count) have signed on to boycott Elsevier publications.

This is an important moment for mathematical publishing, a rare time when the attention of the community is really fixed on an issue that’s been torturing our librarians for many years.  What’s the next step?  A good place to follow people’s thinking will be Gowers’s blog.  Tim, of course, has been a driving force behind the current movement to stop complaining about rent-seeking academic publishers and start doing something about it.

 


Tim GowersA more formal statement about mathematical publishing

A group of mathematicians have been putting together a statement that explains some of the background to, and reasons for, the Elsevier boycott. This statement, which has been signed by 34 mathematicians (we are confident that many more would be happy to endorse it, but we had to stop somewhere), is now ready for release. If you are interested in reading it, then click here.

While I’m posting, let me briefly mention one or two items of Elsevier-related news.

1. Elsevier have written an open letter defending themselves against some of the charges that have been laid against them. (I plan to write a post soon responding to some of their points.)

2. In what I see as a fairly dramatic development, Ingrid Daubechies, President of the International Mathematical Union, has signed up at The Cost of Knowledge website, declaring that she has resigned from her editorial roles with Elsevier. I know of other people who have done the same, but I am not sure that they want that information to be widely publicized, so I think I had better not say who they are.

3. Apparently the boycott caused the Elsevier share price to fall. Amusingly, the investment firm Exane Paribas regards this as an opportunity to make money:

Please find our 8 page report on Reed Elsevier released this morning. We argue that:

Noise around boycott against Elsevier offers short term trading opportunity

Reed Elsevier was the worst performing media stock last week. We believe this is due to investor concerns on the back of T. Gowers’ petition to boycott publishing and refereeing in Elsevier’s journals. We believe the share price reaction was overdone and recommend buying the shares.

Scientists are boycotting the boycott

Similar petitions in favour of Open Access were organised in 2000 and 2007, with no impact on Elsevier’s fundamentals. Our tracking not only shows that this latest petition lags behind the two preceding ones but also suggests that its momentum is slowing. Fewer than 5,000 scientists have signed up, whereas Elsevier works with more than 6m scientists worldwide. The low take-up of this petition is a sign of the scientific community’s improving perception of Elsevier.

Open Access unlikely to hurt financials in the medium term and is priced in

The proportion of Open Access is growing at less than 1% pa. Elsevier’s contract lengths are getting longer and the company’s growth efforts are focused on new products rather than pricing. Open Access is unlikely to hurt Elsevier in the next five years and the longer term risk is more than priced in, in our view.

Results are due on 16 February

We expect EPS11e of 47p, slightly ahead of the consensus 46p, and an outlook supportive of the group’s defensive growth profile and improved fundamentals. The announcement of a new CFO and a possible share buyback could be two additional positives. Reed Elsevier PLC trades on EV/EBIT12e of 8.8x. It offers defensive growth at a reasonable price. We remain buyers of the stock on the current share price weakness.


Dave BaconHaving it both ways

In one of Jorge Luis Borges’ historical fictions, an elderly Averroes, remarking on a misguided opinion of his youth,  says that to be free of an error it is well to have professed it oneself.  Something like this seems to have happened on a shorter time scale in the ArXiv, with last November’s The quantum state cannot be interpreted statistically  sharing two authors with this January’s The quantum state can be interpreted statistically.  The more recent paper explains that the two results are actually consistent because the later paper abandons the earlier paper’s assumption that independent preparations result in an ontic state of product form.  To us this seems an exceedingly natural assumption, since it is hard to see how inductive inference would work in a world where independent preparations did not result in independent states.  To their credit,  and unlike flip-flopping politicians, the authors do not advocate or defend their more recent position; they only assert that it is logically consistent.

Doug NatelsonScience and politics - 2 items

I really got a kick out of this video, and this one (some overlap), of President Obama at the White House Science Fair.  See here for a press description, and here for the White House's own page on the matter.  Note that Bill Nye was there - how cool is that?  It's great to see a President who uses a bit of the prestige of the office to shine a spotlight on the importance of science education.   

On an unrelated note, on the way home from work today I heard this story on NPR, about the challenges of nonprofits, specifically charities, that fund science research.  There is a tendency for those charities to shy away from politically controversial topics (e.g., human embryonic stem cells) because charities don't want to risk alienating any potential donor.  It is an interesting set of issues.  Charities are of course not obligated to fund any specific thing.  My sense is that one clear ethical line is that charities should be open and honest about what they do and don't support.  No hidden agendas.  If you don't want to fund something, just say so clearly, so that there's no "buyer's remorse" from donors who feel mislead.

David Hoggeclipsing binaries to population inference

Schiminovich and I returned to our undisclosed location today to work on the eclipsing white dwarfs we found in our GALEX time-stream project. We spent an inordinate amount of time working out how to infer the properties of all white-dwarf binary systems from a small number of discovered eclipsers. It is possible of course; the magic of a probabilistic generative model makes anything possible when selection and discovery can be modeled at least statistically, which it can (easily) in this case. In paper one, we are only going to do the most rough (order-of-magnitude) population inference, but eventually we should be able to say quite a bit. One consequence of our discussion was an increased optimism that we might have or get some companions that are substellar.

February 07, 2012

Cosmic VarianceHow To Think About Quantum Field Theory

I continue to believe that “quantum field theory” is a concept that we physicists don’t do nearly enough to explain to a wider audience. And I’m not going to do it here! But I will link to other people thinking about how to think about quantum field theory.

Over on the Google+, I linked to an informal essay by John Norton, in which he recounts the activities of a workshop on QFT at the Center for the Philosophy of Science at the University of Pittsburgh last October. In Norton’s telling, the important conceptual divide was between those who want to study “axiomatic” QFT on the one hand, and those who want to study “heuristic” QFT on the other. Axiomatic QFT is an attempt to make everything absolutely perfectly mathematically rigorous. It is severely handicapped by the fact that it is nearly impossible to get results in QFT that are both interesting and rigorous. Heuristic QFT, on the other hand, is what the vast majority of working field theorists actually do — putting aside delicate questions of whether series converge and integrals are well defined, and instead leaping forward and attempting to match predictions to the data. Philosophers like things to be well-defined, so it’s not surprising that many of them are sympathetic to the axiomatic QFT program, tangible results be damned.

The question of whether or not the interesting parts of QFT can be made rigorous is a good one, but not one that keeps many physicists awake at night. All of the difficulty in making QFT rigorous can be traced to what happens at very short distances and very high energies. And that’s certainly important to understand. But the great insight of Ken Wilson and the effective field theory approach is that, as far as particle physics is concerned, it just doesn’t matter. Many different things can happen at high energies, and we can still get the same low-energy physics at the end of the day. So putting great intellectual effort into “doing things right” at high energies might be misplaced, at least until we actually have some data about what is going on there.

Something like that attitude is defended here by our former guest blogger David Wallace. (Hat tip to Cliff Harvey on G+.) Not the best video quality, but here is David trying to convince his philosophy colleagues to concentrate on “Lagrangian QFT,” which is essentially what Norton called “heuristic QFT,” rather than axiomatic QFT. His reasoning very much follows the Wilsonian effective field theory approach.

The concluding quote says it all:

LQFT is the most successful, precise scientific theory in human history. Insofar as philosophy of physics is about drawing conclusions about the world from our best physical theories, LQFT is the place to look.


Dirac Sea ShoreCurrently at the KITP

Right now I’m in the midst of a program I helped to organize (and I’m still organizing) at the KITP. The program deals with the question of how to use numerical methods from lattice and gravity to make inroads into interesting (usually very hard) questions about quantum field theory (and quantum gravity) and the dynamics of the strong interactions at finite temperature (like in the heavy ion collisions).

 

We’ve had a lot of great talks about a wide variety of topics. Personally, I really liked the talk by Phillipe DeForcrand on the sign problem. The main reason I like it is because he had really simple examples that illustrate what the sign problem is all about. You can find it here.

And if you want to see what we’ve been hearing about, you can go here and see the full list of talks so far.


Filed under: gravity, high energy physics, Physics, quantum fields, Quantum Gravity

Scott AaronsonWhether or not God plays dice, I do

Another Update (Feb. 7): I have a new piece up at IEEE Spectrum, explaining why I made this bet.  Thanks to Rachel Courtland for soliciting the piece and for her suggestions improving it.

Update: My $100,000 offer for disproving scalable quantum computing has been Slashdotted.  Reading through the comments was amusing as always.  The top comment suggested that winning my prize was trivial: “Just point a gun at his head and ask him ‘Convinced?’”  (For the record: no, I wouldn’t be, even as I handed over my money.  And if you want to be a street thug, why limit yourself to victims who happen to have made public bets about quantum computing?)  Many people assumed I was a QC skeptic, and was offering the prize because I hoped to spur research aimed at disproving QC.  (Which is actually an interesting misreading: I wonder how much “pro-paranormal” research has been spurred by James Randi’s million-dollar prize?)  Other people said the bet was irrelevant since D-Wave has already built scalable QCs.  (Oh, how I wish I could put the D-Wave boosters and the QC deniers in the same room, and let them duke it out with each other while leaving me alone for a while!)  One person argued that it would be easy to prove the impossibility of scalable QCs, just like it would’ve been easy to prove the impossibility of scalable classical computers in 1946: the only problem is that both proofs would then be invalidated by advances in technology.  (I think he understands the word “proof” differently than I do.)  Then, buried deep in the comments, with a score of 2 out of 5, was one person who understood precisely:

I think he’s saying that while a general quantum computer might be a very long way off, the underlying theory that allows such a thing to exist is on very solid ground (which is why he’s putting up the money). Of course this prize might still cost him since if the news of the prize goes viral he’s going to spend the next decade getting spammed by kooks.

OK, two people:

    There’s some needed context.  Aaronson himself works on quantum complexity theory.  Much of his work deals with quantum computers (at a conceptual level–what is and isn’t possible).  Yet there are some people who reject the idea the quantum computers can scale to “useful” sizes–including some very smart people like Leonid Levin (of Cook-Levin Theorem fame)–and some of them send him email, questions, comments on his blog, etc. saying so.  These people are essentially asserting that Aaronson’s career is rooted in things that can’t exist.  Thus, Aaronson essentially said “prove it.”  It’s true that proving such a statement would be very difficult … But the context is that Aaronson gets mail and questions all the time from people who simply assert that scalable QC is impossible, and he’s challenging them to be more formal about it.  He also mentions, in fairness, that if he does have to pay out, he’d consider it an honor, because it would be a great scientific advance.

For better or worse, I’m now offering a US$100,000 award for a demonstration, convincing to me, that scalable quantum computing is impossible in the physical world.  This award has no time limit other than my death, and is entirely at my discretion (though if you want to convince me, a good approach would be to convince most of the physics community first).  I might, also at my discretion, decide to split the award among several people or groups, or give a smaller award for a discovery that dramatically weakens the possibility of scalable QC while still leaving it open.  I don’t promise to read every claimed refutation of QC that’s emailed to me.  Indeed, you needn’t even bother to send me your refutation directly: just convince most of the physics community, and believe me, I’ll hear about it!  The prize amount will not be adjusted for inflation.

The impetus for this prize was a post on Dick Lipton’s blog, entitled “Perpetual Motion of the 21st Century?”  (See also this followup post.)  The post consists of a debate between well-known quantum-computing skeptic Gil Kalai and well-known quantum-computing researcher Aram Harrow (Shtetl-Optimized commenters both), about the assumptions behind the Quantum Fault-Tolerance Theorem.  So far, the debate covers well-trodden ground, but I understand that it will continue for a while longer.  Anyway, in the comments section of the post, I pointed out that a refutation of scalable QC would require, not merely poking this or that hole in the Fault-Tolerance Theorem, but the construction of a dramatically-new, classically-efficiently-simulable picture of physical reality: something I don’t expect but would welcome as the scientific thrill of my life.  Gil more-or-less dared me to put a large cash prize behind my words—as I’m now, apparently, known for doing!—and I accepted his dare.

To clarify: no, I don’t expect ever to have to pay the prize, but that’s not, by itself, a sufficient reason for offering it.  After all, I also don’t expect Newt to win the Republican primary, but I’m not ready to put $100,000 on the line for that belief.  The real reason to offer this prize is that, if I did have to pay, at least doing so would be an honor: for I’d then (presumably) simply be adding a little to the well-deserved Nobel Prize coffers of one of the greatest revolutionaries in the history of physics.

Over on Lipton’s blog, my offer was criticized for being “like offering $100,000 to anyone who can prove that Bigfoot doesn’t exist.”  To me, though, that completely misses the point.  As I wrote there, whether Bigfoot exists is a question about the contingent history of evolution on Earth.  By contrast, whether scalable quantum computing is possible is a question about the laws of physics.  It’s perfectly conceivable that future developments in physics would conflict with scalable quantum computing, in the same way that relativity conflicts with faster-than-light communication, and the Second Law of Thermodynamics conflicts with perpetuum mobiles.  It’s for such a development in physics that I’m offering this prize.

Update: If anyone wants to offer a counterpart prize for a demonstration that scalable quantum computing is possible, I’ll be happy for that—as I’m sure, will many experimental QC groups around the world.  I’m certainly not offering such a prize.

Clifford JohnsonDouble Equivalence

Wow. So I've been wondering how far behind I might be in my lectures for the General Relativity class. I seemed to spend a bit more time than I remember teaching a recap of how to think about rotations, using it as an operational and mathematical brace upon which to build my review/revisit of Special Relativity. I was definitely convinced that I was a bit behind after two lectures on introducing how to study a little geometry using intrinsic quantities rather than by reference to embedding it inside another geometry (e.g., learning to think about a two-sphere in its own right instead of as the surface of a ball - this prepares you for thinking about a three-sphere, for which the ball would be hard to visualize, or draw), and so forth. All solidly useful material for the students (in this and so many other physics pursuits to come), so I do not regret spending time on it, but I did wonder about where I was in the journey... Anyway, I got to the statement of the Equivalence Principle yesterday, the foundation of the whole of General Relativity. I was feeling quite pleased that we're starting on this now, putting to use all the hard work we've been doing conceptually so far... and thought I'd do a quick post here on the blog to celebrate that we've got there. I thought I'd entitle the post "Equivalence". I started typing and then thought I'd see if I'd written anything about it here before. [...]

Robert HellingAdS/cond-mat

Last week, Subir Sachdev came to Munich to give three Arnold Sommerfeld Lectures. I want to take this opportunity to write about a subject that has attracted a lot of attention in recent years, namely applying AdS/CFT techniques to condensed matter systems like trying to write gravity duals for D-wave superconducturs or strange metals (it's surprisingly hard to find a good link for this keyword).

My attitude towards this attempt has somewhat changed from "this will never work" to "it's probably as good as anything else" and in this post I will explain why I think this. I should mention as well that Sean Hartnoll has been essential in this phase transition of my mind.

Let me start by sketching (actually: caricaturing) what I am talking about. You want to understand some material, typically the electrons in a horribly complicated lattice like bismuth strontium calcium copper oxide, or BSCCO. To this end, you come up with a five dimensional theory of gravity coupled to your favorite list of other fields (gauge fields, scalars with potentials, you name it) and place that in an anti-de-Sitter background (or better, for finite temperature, in an asymptotically anti-de-Sitter black hole). Now, you compute solutions with prescribed behavior at infinity and interpret these via Witten's prescription as correlators in your condensed matter theory. For example you can read off Green functions and (frequency dependent) conductivities, densities of state.

How can this ever work, how are you supposed to guess the correct field content (there is no D-brane/string description anywhere near that could help you out) and how can you ever be sure you got it right?

The answer is you cannot but it does not matter. It does not matter as it does not matter elsewhere in condensed matter physics. To clarify this, we have to be clear about what it means for a condensed matter theorist to "understand" a system. Expressed in our high energy lingo, most of the time, the "microscopic theory" is obvious: It is given by the Schrödinger equation for $10^23$ electrons plus as similar number of noclei feeling the Coulomb potential of the nuclei and interacting themselves with Coulomb repulsion. There is nothing more to be known about this. Except that this is obviously not what we want. These are far too many particles to worry about and, what is more important, we are interested in the behavior at much much lower energy scales and longer wave lengths, at which all the details of the lattice structure are smoothed out and we see only the effect of a few electrons close to the Fermi surface. As an estimate, one should compare the typical energy scale of the Coulomb interactions, the binding energies of the electrons to the nucleus (Z times 13.6 eV) or in terms of temperature (where putting in the constants equates 1eV to about 10,000K) to the milli-eV binding energy of Cooper pairs or the typical temperature where superconductivity plays a role.

In the language of the renormalization group, the Coulomb interactions are the UV theory but we want to understand the effective theory that this flows to in the IR. The convenient thing about such effective theories is that they do not have to be unique: All we want is a simple to understand theory (in which we can compute many quantities that we would like to know) that is in the same universality class as the system we started from. Differences in relevant operators do not matter (at least to leading order).

Surprisingly often, one can find free theories or weakly (and thus almost free) theories that can act as the effective theory we are looking for. BCS is a famous example, but Landau's Fermi Liquid Theory is another: There the idea is that you can almost pretend that your fermions are free (and thus you can just add up energies taking into account the Pauli exclusion principle giving you Fermi-surfaces etc) even though your electrons are interacting (remember, there is always the Coulomb interaction around). The only effect the interactions have, is to renormalize the mass, to deform the Fermi surface away from a ball and to change the hight of the jump in the T=0 occupation number. Experience shows that this is an excellent description in more than one dimension (that has the exception of the Luttinger liquid) and can probably traced back to the fact that a four-Fermi-interaction is non-renormalizable and thus invisible in the IR.

Only, it is important to remember that the fields/particles in that effective theories are not really the electrons you started with but just quasi-particles that are build in complicated ways out of the microscopic particles carrying around clouds of other particles and deforming the lattice they move in. But these details don't matter and that is the point.

It is only important to guess the effective theory in the same universality class. You never derive this (or: hardly ever). Following an exact renormalization group flow is just way beyond what is possible. You make a hopefully educated guess (based on symmetries etc) and then check that you get good descriptions. But only the fact, that there are not too many universality classes makes this process of guessing worthwhile.

Free or weakly coupled theories are not the only possible guesses for effective field theories in which one can calculate. 2d conformal field theories are others. And now, AdS-technology gives us another way of writing down correlation functions just as Feynman-rules give us correlation functions for weakly coupled theories. And that is all one needs: Correlation functions of effective field theory candidates. Once you have those you can check if you are lucky and get evidence that you are in the correct universality class. You don't have to derive the IR theory from the UV. You never do this. You always just guess. And often enough this is good enough to work. And strictly speaking, you never know if your next measurement shows deviations from what you thought would be an effective theory for your system.

In a sense, it is like the mystery that chemistry works: The periodic table somehow pretends that the electrons in atoms are arranged in states that group together like for the hydrogen atom, you get the same n,l,m,s quantum numbers and the shells are roughly the same (although with some overlap encoded in the Aufbau principle) as for hydrogen. This pretends that the only effect of the electron-electron Coulomb potential is to shield the charge of the nucleus and every electron sees effectively a hydrogen like atom (although not necessarily with integer charge Z) and Pauli's exclusion principle regulates that no state is filled more than once. One could have thought that the effect of n-1 electrons on the last is much bigger, after all, they have a total charge that is almost the same of the nucleous, but it seems, the last electron only sees the nucleus with a 1/r potential although with reduced charge.

If you like, the only thing one should might worry about is that the Witten prescription to obtain boundary correlators from bulk configurations really gives you valid n-point functions of a quantum theory (if you feel sufficient mathematical masochism for example in the sense of Wightman) but you don't want to show that it is the quantum field theory corresponding to the material you started with.