Weirdness in the Primes

March 15, 2016

Posted by John Baez

$MathML-enabled post (click for more details).$

What percent of primes end in a 7? I mean when you write them out in base ten.

Well, if you look at the first hundred million primes, the answer is 25.000401%. That’s very close to 1/4. And that makes sense, because there are just 4 digits that a prime can end in, unless it’s really small: 1, 3, 7 and 9.

So, you might think the endings of prime numbers are random, or very close to it. But 3 days ago two mathematicians shocked the world with a paper that asked some other questions, like this:

If you have a prime that ends in a 7, what’s the probability that the next prime ends in a 7?

I would have expected the answer to be close to 25%. But these mathematicians, Robert Oliver and Kannan Soundarajan, actually looked. And they found that among the first hundred million primes, the answer is just 17.757%.

So if a prime ends in a 7, it seems to somehow tell the next prime “I rather you wouldn’t end in a 7. I just did that.”

$MathML-enabled post (click for more details).$

This strikes me as weird. And apparently it’s not just because I don’t know enough number theory. Ken Ono is a real expert on number theory, and when he learned about this, he said:

I was floored. I thought, “For sure, your program’s not working.”

Needless to say, it’s not magic. There will be an explanation. In fact, Oliver and Soundarajan have conjectured a formula that says exactly how much of a discrepancy to expect - and they’ve checked it, and it seems to work. It works in every base, not just base ten. But we still need a proof that it really works.

By the way, their formula says the discrepancy gets smaller and smaller when we look at more and more primes. If we look at primes less than $N$ , the discrepancy is on the order of

$\frac{\log(\log(N))}{\log(N)}$

This goes to zero as $N \to \infty$ . But this discrepancy is huge compared to the discrepancy for the simpler question, “what percentage of primes ends in a given digit?” For that, the discrepancy, called the Chebyshev bias, is on the order of

$\frac{1}{\log(N) \sqrt{N}}$

Of course, what’s really surprising is not this huge correlation between the last digits of consecutive primes, but that number theorists hadn’t thought to look for it until now!

Any amateur with decent programming skills could have spotted this and won everlasting fame, if they’d thought to look. What other patterns are hiding in the primes?

For more, read this:

Erica Klarreich, Mathematicians discover prime conspiracy, Quanta, March 13, 2016.

and this:

Terry Tao, Biases between consecutive primes, What’s New, March 14, 2016.

and of course the actual paper:

Robert J. Lemke Oliver and Kannan Soundararajan, Unexpected biases in the distribution of consecutive primes, March 11, 2016.

Their work involves a variant of the Hardy–Littlewood $k$ -tuple conjecture, which is a conjectured formula for the density of ‘constellations’ of primes of a given ‘shape’—that is, $k$ -tuples of primes that are of the form

$(a_1 + n , \dots, a_k + n)$

for some given ‘shape’ $(a_1, \dots, a_k)$ .

I just noticed something funny. It seems that the Hardy–Littlewood $k$ -tuple conjecture is also called the ‘first Hardy–Littlewood conjecture’. The ‘second Hardy–Littlewood conjecture’ says that

$\pi(M + N) \le \pi(M) + \pi(N)$

whenever $M,N \ge 2$ , where $\pi(N)$ is the number of primes $\le N$ .

What’s funny is what Wikipedia says about the second Hardy–Littlewood conjecture! It says:

This is probably false in general as it is inconsistent with the more likely first Hardy–Littlewood conjecture on prime $k$ -tuples, but the first violation is likely to occur for very large values of $M$ .

Is this true? If so, did Hardy and Littlewood notice that their two conjectures contradicted each other? Isn’t there some rule against this? Otherwise you could just conjecture $P$ and also $not(P)$ , disguising $not(P)$ in some very different language, and be sure that one of your conjectures was true!

(Unless, of course, you’re an intuitionist.)

Posted at March 15, 2016 2:52 AM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2869

19 Comments & 0 Trackbacks

Re: Unexpected Biases in the Distribution of Consecutive Primes

$MathML-enabled post (click for more details).$

Has anyone checked the function field analog? Does anyone know if it is easier to prove?

Posted by: Asvin on March 15, 2016 4:53 AM | Permalink | Reply to this

Re: Unexpected Biases in the Distribution of Consecutive Primes

$MathML-enabled post (click for more details).$

Since this paper is only 3 days old and it doesn’t mention the function field analogue, maybe it’s open. What would it be, exactly? How hard is it to check via computer?

Posted by: John Baez on March 15, 2016 5:44 AM | Permalink | Reply to this

Re: Unexpected Biases in the Distribution of Consecutive Primes

$MathML-enabled post (click for more details).$

Terry Tao considers instead congruence classes of prime differences $(p' - p'')$ for $p',p'' ~ x + O(\log x)$ , so perhaps something can be said about differences of irreducible polynomials given that the differences are in small-degree terms?

Or is “function field analogue” something completely different?

Posted by: Jesse C. McKeown on March 16, 2016 7:43 AM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

I got distracted by a throwaway line in the Quanta article, saying that the people who discovered this result about the primes were inspired by a lecture on probability in which it was mentioned that the expected number of coin tosses until you get two consecutive heads is 6, whereas the expected number until you get a head followed by a tail is 4.

It’s a fun exercise in induction and combinatorics to prove that the probabilities to get the desired sequence, HH or HT, after (but only after) precisely $n$ coin tosses are:

$P_{HH}(n) = \frac{F_{n-1}}{2^n}$

where $F_i$ is a Fibonacci number, with the convention $F_1=F_2=1$ , and:

$P_{HT}(n) = \frac{n-1}{2^n}$

From this, it can be shown that the expectation values for $n$ are exactly 6 and 4.

Posted by: Greg Egan on March 15, 2016 1:29 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

the expected number of coin tosses until you get two consecutive heads is 6, whereas the expected number until you get a head followed by a tail is 4.

I like it! I bet some people here find it intuitively obvious that it in a given run of coin-tosses, it’s more likely that there’s been an HT than an HH. But I don’t, so let’s see what we can come up with.

For runs of three tosses, there are three that contain an HH —

$HHH, THH, HHT$

— and four that contain HT —

$HTH, HTT, HHT, THT.$

So this already gives some clues… but maybe the problem becomes more understandable if we symmetrize.

The probability that a run of $n$ coin-tosses contains either an HH or a TT is $1 - 2/n$ , since if it contains neither then it can only be $HTHT\ldots$ or $THTH\ldots$ . The events “contains an HH” and “contains a TT” have equal probabilities but aren’t disjoint, so the probability that the run contains an HH isn’t simply half of the probability $p = 1 - 2/n$ just calculated: you have to take $p/2$ , then add back on half of the probability that both have occurred (by the inclusion-exclusion principle).

The probability that a run of $n$ coin-tosses contains either an HT or a TH is also $p = 1 - 2/n$ , since if it contains neither then it can only be $HHHH\ldots$ or $TTTT\ldots$ . Again, the two events aren’t disjoint, and again, the overall probability is $p$ plus half of the probability that both have occurred.

So, the question is why a run of $n$ coin-tosses is more likely to contain both an HT and a TH than both an HH and a TT. And that now seems intuitively clear: in the sequence of tosses, any HH and any TT must be disjoint, whereas an HT and a TH can overlap (as in HTH or THT). So, there are more ways to have both HT and TH than to have both HH and TT. You see this in the case $n = 3$ , where it’s impossible to get both HH and TT, but there are two ways of getting both HT and TH.

So that’s my explanation. But it’s still indirect. Someone else can probably do better; if so, please share!

Posted by: Tom Leinster on March 15, 2016 3:43 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Here’s the simplest way I’ve come up with to think about it. The point is that it’s easier to avoid HH than it is to avoid HT. Once you think about that a little, it becomes obvious that the two are less similar than they superficially appear to be:

A run of $n$ coin tosses that avoids HT has to be all Hs once the first H appears. Such a run is completely determined by where the first H appears (if it does at all), so there are only $n+1$ of them.

On the other hand, a run of $n$ coin tosses that avoids HH is determined by a subset of $\{1, \ldots, n\}$ containing no two consecutive numbers. You could, for a start, take any subset consisting only of odd numbers. This shows that there are exponentially many (more than $2^{n/2}$ ) such subsets.

So avoiding HH for the first $n$ coin tosses is easier than avoiding HT for $n$ coin tosses.

By the way, I hadn’t heard about this fact before, and it wasn’t intuitively obvious to me when I learned it. One of the things I like about probability is that it constantly challenges me to retrain my intuition!

Posted by: Mark Meckes on March 15, 2016 9:52 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Ah, that’s good: much better than my explanation. Thanks, Mark!

Posted by: Tom Leinster on March 15, 2016 10:34 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Another pretty intuitive explanation by Ryan Alweiss (source):

You are on the first rock. In one case you advance to the next rock or stay put with probability .5 and in the other the same is true if you are on the first rock, but if you’re on the second rock and flip you have a one half chance of finishing… and a one half chance of moving backward.

The first scenario is HT (the H is first to second rock and T is second to third) and the second is HH. In the first scenario another H won’t push you back if you just flipped an H but in the second a T will push you back if you just flipped an H.

Inspired by the above, I figured out a quicker way to calculate the expectations. Treat the whole game as a Markov chain with three states (H, T, and Done). Each coin toss is a state transition, and the H/T state indicates the outcome of the last toss. Since the goal is to achieve HH/HT, only H can transition to Done, with probability 1/2. In the HH case, H retreats to T with probability 1/2, while in the HT case, H stays with H with probability 1/2. Let $E_H$ (resp. $E_T$ ) be the expectation of number of transitions needed to hit the Done state starting in H (resp. T). What we want to find is $E_T$ (starting from nothing is no different from starting with T).

In both cases, $E_T=1+\frac{1}{2}E_H+\frac{1}{2}E_T$ . In the HH case, $E_H=1+\frac{1}{2}\cdot0+\frac{1}{2}E_T$ . In the HT case, $E_H=1+\frac{1}{2}\cdot0+\frac{1}{2}E_H$ . Solving the linear equations yields $E_T=6$ and $E_T=4$ respectively.

Posted by: XJY on March 18, 2016 5:02 AM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Greg wrote:

It’s a fun exercise in induction and combinatorics to prove that the probabilities to get the desired sequence, HH or HT, after (but only after) precisely $n$ coin tosses are:

$P_{HH}(n)= \frac{F_{n-1}}{2^n}$

Cool! Maybe this is related to how the $n$ th Fibonacci number is the number of ways to take a list of $n$ dots and break them up into blocks of length $1$ or $2$ ? With this definition it’s really easy to check that $F_{n+1} = F_n + F_{n-1}$ .

Let’s see what $F_3$ is with this definition:

o|o|o

oo|o

o|oo

I’m getting $F_3 = 3$ .

So my $F_n$ seems to be your $F_{n-1}$ . I believe my convention, with $F_0 = F_1 = 1$ , is standard. It also simplifies your formula a bit.

But never mind—this is cool, I’d never thought about it in terms of coin flips! In fact, I still don’t see what these ‘blocks of length 1 or 2’ have to do with your coin flip calculation. And now it’s time to cook breakfast, so I’ll leave that for someone else!

Posted by: John Baez on March 15, 2016 6:52 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Remove the final HH and break the remaining sequence of $n-2$ letters into HT’s and T’s… With your $F_n$ we get $F_{n-2}$ in the formula.

Posted by: XJY on March 16, 2016 1:47 AM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

XJY’s answer is ingenious, and (taking John’s property of the Fibonacci numbers as given) provides a much faster route to the probability than the one I used.

Posted by: Greg Egan on March 16, 2016 3:52 AM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Nice, XJY!

To complete the argument, here’s how to check that the $n$ th Fibonacci number is the number of Fibonacci structures on a list of $n$ dots: ways to break that list into blocks of length 1 or 2.

Let $A_n$ be the number of such ways. Clearly $A_0 = A_1 = 1$ . Furthermore, suppose we have any Fibonacci structure on a list of $n \gt 0$ dots. It must end in a block of length 1 or 2. Removing that final block, we’re left with a Fibonacci structure on either $n-1$ or $n-2$ dots. Conversely, any Fibonacci structure on $n-1$ or $n-2$ dots gives one on $n$ dots. So, $A_n = A_{n-1} + A_{n-2}$ .

Replacing ‘1 or 2’ by other collections of natural numbers, we get a swarm of generalizations of the Fibonacci numbers, including the Tribonacci numbers.

Posted by: John Baez on March 16, 2016 5:08 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Three years ago, a well-known expert in analytic number theory told me a curious thing about the subject: that analytic number theorists’ conjectures about the distribution of primes are very seldom wrong.

In other words, analytic number theorists have very good intuition about how the primes are distributed, far outstripping their current ability to prove such statements.

I wonder whether this discovery is causing analytic number theorists to re-assess their confidence in their intuition.

Posted by: Tom Leinster on March 16, 2016 1:54 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

I would guess that this result won’t really affect analytic number theorists’ confidence in the correctness of their intuition, though it may affect how far they think their intuition extends. As far as I’ve heard, as surprising as it is, doesn’t contradict other recent widely-believed conjectures.

It should also be noted that recent conjectures in analytic number theory tend to be made based on a lot more than intuition, and even more than intuition plus massive numerical experimentation. They also use (not-necessarily-explained) analogies with other fields, like random matrix theory for example, where (often extremely difficult) explicit calculations can be done, then make number-theoretic conjectures based on the results of those calculations.

Posted by: Mark Meckes on March 16, 2016 5:11 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Tom wrote:

I wonder whether this discovery is causing analytic number theorists to re-assess their confidence in their intuition.

Reading Terry Tao’s blog article and the original paper, I get the feeling that this effect was no surprise to analytic number theorists after they’d taken a little while to think about it.

I believe they’d never thought about this issue—so when they were first confronted with it, they were shocked based on the very simple intuition that ‘primes should be random’. But analytic number theorists have a lot of refinements of this simple intuition at their disposal, like the Hardy–Littlewood $k$ -tuple conjecture, which gives an asymptotic formula for the number of prime constellations of any shape. And these refinements (many still conjectures, not theorems) can be used to understand what’s going on.

That’s why I think the real surprise in this story is that nobody had ever asked questions like this:

If you have a prime that ends in a 7, what’s the probability that the next prime ends in a 7?

It’s really a pity that some teenager in Mumbai wasn’t the first one to give it a try.

Posted by: John Baez on March 16, 2016 6:19 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

An excellent example of how the really exciting progress in mathematics is often more about finding good questions than finding answers.

Posted by: Mark Meckes on March 17, 2016 4:54 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

Well, Terry Tao has pointed out that people had thought about this question more than I’d realized. Among the first 10 million primes, most of the deficiency of consecutive primes ending in 7 can be explained rather simply, and the subtler discrepancy on order of $\log(\log(N))/\log(N)$ , the topic of Lemke Oliver and Soundararajan’s paper, would require some expertise to detect:

Terry wrote:

There were at least two papers in experimental number theory that noticed some manifestation of the Lemke Oliver-Soundararajan bias on a qualitative level: a 2011 paper of Ash, Beltis, Gross, and Sinott, and a 2002 paper of Ko. But I think the difficult task was to separate this bias from previously known biases in the distribution of the primes, such as the Chebyshev bias discussed in the blog post, or biases arising from the prime tuples conjecture (which, for instance, predicts that a prime gap of 6 is twice as likely to occur as a prime gap of 2). To realise that this particular bias is significantly stronger than what can be easily explained from these previously identified biases seems to require a certain amount of theoretical mathematical training.

I replied:

The paper by Lemke Oliver and Soundararajan starts with some striking facts like this: of the first hundred million primes, 25.000401% end in 7 in base ten. That’s very close to 1/4, which seems to make sense… but if one of these primes ends in 7, the probability that the next one ends in 7 is just 17.757%!

This is the sort of thing that an amateur with good computer skills could probably notice. I had thought this was a new observation. I’ll have to look at those other papers

He replied:

The first hundred million primes occupy about 12% of the numbers up to 2 billion that end in 1, 3, 7, or 9 (discarding the tiny primes 2 and 5). If they were distributed randomly amongst this set, the probability that a prime p ending in 7 would be followed next by another prime ending in 7 would be about $0.12 (1-0.12)^3 / (1 - (1-0.12)^4) \approx 0.20,$ which is already significantly less than 25%, basically because the next numbers after p that end in 9, 1, or 3 get to “go first” and have a chance of being prime before the next number p+10 ending in 7 needs to be tested. An amateur who observes the somewhat stronger 17% bias without doing something like the above quantitative calculation (or at least benchmarking the prime number statistics against a suitable random model) may end up ascribing the bulk of the bias to this effect (which, according to the Quanta article, is what Sound and Lemke Oliver did also, before working out the theoretical calculations more carefully).

Posted by: John Baez on March 28, 2016 5:17 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

If you use Wolfram’s Mathematica, here’s a one-liner to compare arbitrary patterns involving the last digit of the first N consecutive primes:

ratio[pattern1_, pattern2_, N_] := Module[ {p = Partition[ParallelMap[Last@*IntegerDigits, Prime@Range@N], Length@pattern1, Length@pattern1 - 1]}, Count[p, pattern1]/ Count[p, pattern2] ]

Where patterns are expressed as lists, with var_ meaning any symbol e.g.

In: ratio[{ 7 , 7 }, { 7 , x_ }, 1000000]

Out: 13201/83338 (about 15.8%)

In: ratio[{3, 9, 1}, {3, 9, x_}, 1000000]

Out: 13747/39476 (about 34.8%)

Posted by: Filippo on April 20, 2016 4:00 PM | Permalink | Reply to this

Re: Weirdness in the Primes

$MathML-enabled post (click for more details).$

I am writing to you because of an article in October 7 issue of New Scientist magazine about prime numbers by Oxford mathematician Vicky Neale. In that article I found few particularly striking sentences: “Primes are fundamental entities from which we make all numbers… “It’s the same idea as in chemistry, where you might try to understand some complicated compounds by understanding the atomic elements…”

I started to look at Adomah Periodic table that I authored 10 years ago and realized that both, Adomah and Janet’s Left Step Table exhibit unexpected periodicity of prime numbers. If symbols of chemical elements are substituted with their atomic numbers, one can see that first period contains element He that has prime atomic number 2. Second period has one such element too: Li(3). Third period has B(5), N(7) and Na(11), total of three prime atomic numbers. Forth period has also three elements with prime atomic numbers: Al, Cl and K. Fifth period - four such elements V, Cu, Ga, Rb. Sixth period - also four: Nb, Tc, Ag and I. And so on. Periods 7 ad 8 has seven elements with prime atomic numbers each.

Listing number of elements with prime atomic numbers for each period of LSPT you get perfect periodicity: 1, 1, 3, 3, 4, 4, 7, 7, that I call periodicity of prime atomic numbers.

But regularity does not end here: if you list number of prime atomic numbers per double period you get: 2, 6, 8, 14. Notice: 2+6=8, 6+8=14. This is Fibonacci-like rule! My internet friend, Jess Tauber has also noticed that numbers 1,3,4 and 7 are so-called Lucas Numbers. Please, note that pattern does not continue after the period that ends with element 120.

But there is another interesting regularity. This is what Jess wrote:

Between 121 and 170 there are 9 primes, where 11 are expected (Lucas). Then from 171 and 220 there are 8 primes, again with 11 expected. From 221 to 292 there are 15 primes, where 18 are expected, and between 293 and 364 there are 10 primes, again with 18 expected. From 365 to 462 there are 17 primes, 29 expected, and from 463 to 560 there are 13 primes, again 29 expected. Then from 561 to 688 there are 22 primes, where 47 are expected, and from 689 to 816 there are 17 primes, again 47 expected. Ok, if we look at the discrepancies for the SECOND period of same-length duals, the pattern emerges. Starting with 7 (up to 120), we have 0 discrepancy. Then after this we have, for second period of duals, 8,10,13,17.

Starting from 7, we then have 7,8,10,13,17, where differences between these are 1,2,3,4.

I am not sure if anyone noticed it before, but I was ecstatic when I came to that realization! When it comes to mathematical beauty, Janet’s Left Step table beats traditional PT hands down. I thought you might find this interesting.

Valery Tsimmerman

Posted by: Valery Tsimmerman on October 22, 2017 4:36 AM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

March 15, 2016