Planet Musings

September 01, 2014

John PreskillMacroscopic quantum teleportation: the story of my chair

In the summer of 2000, a miracle occurred: The National Science Foundation decided to fund a new Institute for Quantum Information at Caltech with a 5 million dollar award from their Information Technology Research program. I was to be the founding director of the IQI.

Jeff Kimble explained to me why we should propose establishing the IQI. He knew I had used my slice of our shared DARPA grant to bring Alexei Kitaev to Caltech as a visiting professor, which had been wonderful. Recalling how much we had both benefited from Kitaev’s visit, Jeff remarked emphatically that “This stuff’s not free.” He had a point. To have more fun we’d need more money. Jeff took the lead in recruiting a large team of Caltech theorists and experimentalists to join the proposal we submitted, but the NSF was primarily interested in supporting the theory of quantum computation rather than the experimental part of the proposal. That was how I wound up in charge, though I continued to rely on Jeff’s advice and support.

This was a new experience for me and I worried a lot about how directing an institute would change my life. But I had one worry above all: space. We envisioned a thriving institute brimming over with talented and enthusiastic young scientists and visitors drawn from the physics, computer science, and engineering communities. But how could we carve out a place on the Caltech campus where they could work and interact?

To my surprise and delight, Jeff and I soon discovered that someone else at Caltech shared our excitement over the potential of IQI — Richard Murray, who was then the Chair of Caltech’s Division of Engineering and Applied Science. Richard arranged for the IQI to occupy office space in Steele Laboratory and some space we could configure as we pleased in Jorgensen Laboratory. The hub of the IQI became the lounge in Jorgensen, which we used for our seminar receptions, group meetings, and innumerable informal discussions, until our move to the beautiful Annenberg Center when it opened in 2009.

I sketched a rough plan for the Jorgensen layout, including furniture for the lounge. The furniture, I was told, was “NIC”. Though I was too embarrassed to ask, I eventually inferred this meant “Not in Contract” — I would need to go furniture shopping, one of my many burgeoning responsibilities as Director.

By this time, Ann Harvey was in place as IQI administrator, a huge relief. But furniture was something I thought I knew about, because I had designed and furnished a common area for the particle theory group a couple of years earlier. As we had done on that previous occasion, my wife Roberta and I went to Krause’s Sofa Factory to order a custom-made couch, love seat, and lounge chair, in a grayish green leather which we thought would blend well with the carpeting.

Directing an institute is not as simple as it sounds, though. Before the furniture was delivered, Krause’s declared bankruptcy! We had paid in full, but I had some anxious moments wondering whether there would be a place to sit down in the IQI lounge. In the end, after some delay, our furniture was delivered in time for the grand opening of the new space in September 2001. A happy ending, but not really the end of the story.

Before the move to Annenberg in 2009, I ordered furniture to fill our (much smaller) studio space, which became the new IQI common area. The Jorgensen furniture was retired, and everything was new! It was nice … But every once in a while I felt a twinge of sadness. I missed my old leather chair, from which I had pontificated at eight years worth of group meetings. That chair and I had been through a lot together, and I couldn’t help but feel that my chair’s career had been cut short before its time.

I don’t recall mentioning these feelings to anyone, but someone must have sensed by regrets. Because one day not long after the move another miracle occurred … my chair was baaack! Sitting in it again felt … good. For five years now I’ve been pontificating from my old chair in our new studio, just like I used to. No one told me how my chair had been returned to me, and I knew better than to ask.

My chair today. Like me, a bit worn but still far from retirement.

My chair today. Like me, a bit worn but still far from retirement.

Eventually the truth comes out. At my 60th birthday celebration last year, Stephanie Wehner and Darrick Chang admitted to being the perpetrators, and revealed the whole amazing story in their article on “Macroscopic Quantum Teleportation” in a special issue of Nature Relocations. Their breakthrough article was enhanced by Stephanie’s extraordinary artwork, which you really have to see to believe. So if your curiosity is piqued, please follow this link to find out more.

Why, you may wonder, am I reminiscing today about the story of my chair? Well, is an excuse really necessary? But if you must know, it may be because, after two renewals and 14 years of operation, I submitted the IQI Final Report to the NSF this week. Don’t worry — the Report is not really Final, because the IQI has become part of an even grander vision, the IQIM (which has given birth to this blog among other good things). Like my chair, the IQI is not quite what it was, yet it lives on.

The nostalgic feelings aroused by filing the Final Report led me to reread the wonderful volume my colleagues put together for my birthday celebration, which recounts not only the unforgettable exploits of Stephanie and Darrick, but many other stories and testimonials that deeply touched me.

Browsing through that book today, one thing that struck me is the ways we sometimes have impact on others without even being aware of it. For example, Aram Harrow, Debbie Leung, Joe Renes and Stephanie all remember lectures I gave when they were undergraduate students (before I knew them), which might have had some impact on their later research careers. Knowing this will make it a little harder to say no the next time I’m invited to give a talk. Yaoyun Shi has vivid memories of the time I wore my gorilla mask to the IQI seminar on Halloween, which inspired him to dress up as “a butcher threatening to cut off the ears of my students with a bloody machete if they were not listening,” thus boosting his teaching evaluations. And Alexios Polychronakos, upon hearing that I had left particle theory to pursue quantum computing, felt it “was a bit like watching your father move to Las Vegas and marry a young dancer after you leave for college,” while at the same time he appreciated “that such reinventions are within the spectrum of possibilities for physicists who still have a pulse.”

I’m proud of what the IQI(M) has accomplished, but we’re just getting started. After 14 years, I still have a pulse, and my chair has plenty of wear left. Together we look forward to many more years of pontification.

 

 


August 31, 2014

Tommaso DorigoHow The Higgs Became The Target Of Run 2 At The Tevatron

Until the second half of the nineties, when the LEP collider started to be upgraded to investigate higher centre-of-mass energies of electron-positron collisions than those until then produced at the Z mass, the Higgs boson was not the main focus of experiments exploring the high-energy frontier. The reason is that the expected cross section of that particle was prohibitively small for the comparatively low luminosities provided by the facilities available at the time. Of course, one could still look for anomalously high-rate production of final states possessing the characteristics of a Higgs boson decay; but those searches had a limited appeal.

read more

n-Category Café Why It Matters

One interesting feature of the Category Theory conference in Cambridge last month was that lots of the other participants started conversations with me about the whole-population, suspicionless surveillance that several governments are now operating. All but one were enthusiastically supportive of the work I’ve been doing to try to get the mathematical community to take responsibility for its part in this, and I appreciated that very much.

The remaining one was a friend who wasn’t unsupportive, but said to me something like “I think I probably agree with you, but I’m not sure. I don’t see why it matters. Persuade me!”

Here’s what I replied.

“A lot of people know now that the intelligence agencies are keeping records of almost all their communications, but they can’t bring themselves to get worked up about it. And in a way, they might be right. If you, personally, keep your head down, if you never do anything that upsets anyone in power, it’s unlikely that your records will end up being used against you.

But that’s a really self-centred attitude. What about people who don’t keep their heads down? What about protesters, campaigners, activists, people who challenge the establishment — people who exercise their full democratic rights? Freedom from harassment shouldn’t depend on you being a quiet little citizen.

“There’s a long history of intelligence agencies using their powers to disrupt legitimate activism. The FBI recorded some of Martin Luther King’s extramarital liaisons and sent the tape to his family home, accompanied by a letter attempting to blackmail him into suicide. And there have been many many examples since then (see below).

“Here’s the kind of situation that worries me today. In the UK, there’s a lot of debate at the moment about the oil extraction technique known as fracking. The government has just given permission for the oil industry to use it, and environmental groups have been protesting vigorously.

“I don’t have strong opinions on fracking myself, but I do think people should be free to organize and protest against it without state harassment. In fact, the state should be supporting people in the exercising of their democratic rights. But actually, any anti-fracking group would be sensible to assume that it’s the object of covert surveillance, and that the police are working against them, perhaps by employing infiltrators — because they’ve been doing that to other environmental groups for years.

“It’s the easiest thing in the world for politicians to portray anti-fracking activists as a danger to the UK’s economic well-being, as a threat to national energy security. That’s virtually terrorism! And once someone’s been labelled with the T word, it immediately becomes trivial to justify actually using all that surveillance data that the intelligence agencies are gathering routinely. And I’m not exaggerating — anti-terrorism laws really have been used against environmental campaigners in the recent past.

“Or think about gay rights. Less than fifty years ago, sex between men in England was illegal. This law was enforced, and it ruined people’s lives. For instance, my academic great-grandfather Alan Turing was arrested under this law and punished by chemical castration. He’s widely thought to have killed himself as a direct result. But today, two men in England can not only have sex legally, they can marry with the full endorsement of the state.

“How did this change so fast? Not by people writing polite letters to the Times, or by going through official parliamentary channels (at least, not only by those means). It was mainly through decades of tough, sometimes dangerous, agitation, campaigning and protest, by small groups and by courageous individual citizens.

“By definition, anyone campaigning for anything to be decriminalized is siding with criminals against the establishment. It’s the easiest thing in the world for politicians to portray campaigners like this as a menace to society — a threat to law and order. Any nation state with the ability to monitor, infiltrate, harass and disrupt such “menaces” will be very sorely tempted to use it. And again, that’s no exaggeration: in the US at least, this has happened to gay rights campaigners over and over again, from the 1950s to nearly the present day, even sometimes — ludicrously — under the banner of fighting terrorism (1, 2, 3, 4).

“So government surveillance should matter to you in a very direct way if you’re involved in any kind of activism or advocacy or protest or campaigning or dissent. It should also matter to you if you’re not, but you quietly support any of this activism — or if you reap its benefits. Even if not (which is unlikely), it matters if you simply want to live in a society where people can engage in peaceful activism without facing disruption or harassment by the state. And it matters more now than it ever did before, because we live so much of our lives on the internet, and government surveillance powers are orders of magnitude greater than they’ve ever been before.”


That’s roughly what I said. I think we then talked a bit about mathematicians’ role in enabling whole-population surveillance. Here’s Thomas Hales’s take on this:

If privacy disappears from the face of the Earth, mathematicians will be some of the primary culprits.

Of course, there are lots of other reasons why the activities of the NSA, GCHQ and their partners might matter to you. Maybe you object to industrial espionage being carried out in the name of national security, or the NSA supplying data to the CIA’s drone assassination programme (“we track ‘em, you whack ‘em”), or the raw content of communications between Americans being passed en masse to Israel, or the NSA hacking purely civilian infrastructure in China, or government agencies intercepting lawyer-client and journalist-source communications, or that the existence of mass surveillance leads inevitably to self-censorship. Or maybe you simply object to being watched, for the same reason you close the bathroom door: you’re not doing anything to be ashamed of, you just want your privacy. But the activism point is the one that resonates most deeply with me personally, and it seemed to resonate with my friend too.

You may think I’m exaggerating or scaremongering — that the enormous power wielded by the US and UK intelligence agencies (among others) could theoretically be used against legitimate citizen activism, but hasn’t been so far.

There’s certainly an abstract argument against this: it’s simply human nature that if you have a given surveillance power available to you, and the motive to use it, and the means to use it without it being known that you’ve done so, then you very likely will. Even if (for some reason) you believe that those currently wielding these powers have superhuman powers of self-restraint, there’s no guarantee that those who wield them in future will be equally saintly.

But much more importantly, there’s copious historical evidence that governments routinely use whatever surveillance powers they possess against whoever they see as troublemakers, even if this breaks the law. Without great effort, I found 50 examples in the US and UK alone — see below.

Six overviews

If you’re going to read just one thing on government surveillance of activists, I suggest you make it this:

Among many other interesting points, it reminds us that this isn’t only about “leftist” activism — three of the plaintiffs in this case are pro-gun organizations.

Here are some other good overviews:

And here’s a short but incisive comment from journalist Murtaza Hussain.

50 episodes of government surveillance of activists

Disclaimer   Journalism about the activities of highly secretive organizations is, by its nature, very difficult. Even obtaining the basic facts can be a major feat. Obviously, I can’t attest to the accuracy of all these articles — and the entries in the list below are summaries of the articles linked to, not claims I’m making myself. As ever, whether you believe what you read is a judgement you’ll have to make for yourself.

1940s

1. FBI surveillance of War Resisters League (1, 2), continuing in 2010 (1)

1950s

2. FBI surveillance of the National Association for the Advancement of Colored People (1)

3. FBI “surveillance program against homosexuals” (1)

1960s

4. FBI’s Sex Deviate programme (1)

5. FBI’s Cointelpro projects, aimed at “surveying, infiltrating, discrediting, and disrupting domestic political organizations”, and NSA’s Project Minaret, targeting leading critics of Vietnam war including senators, civil rights leaders and journalists (1)

6. FBI attempt to blackmail Martin Luther King into suicide with surveillance tape (1)

7. NSA interception of antiwar activists, including Jane Fonda and Dr Benjamin Spock (1)

8. Harassment of California student movement (including Stephen Smale’s free speech advocacy) by FBI, with support of Ronald Reagan (1, 2)

1970s

9. FBI surveillance and attempted deportation of John Lennon (1)

10. FBI burglary of the office of the psychiatrist of Pentagon Papers whistleblower Daniel Ellsberg (1)

1980s

11. Margaret Thatcher had the Canadian national intelligence agency CSEC surveil two of her own ministers (1, 2, 3)

12. MI5 tapped phone of founder of Women for World Disarmament (1)

13. Ronald Reagan had the NSA tap the phone of congressman Michael Barnes, who opposed Reagan’s Central America policy (1)

1990s

14. NSA surveillance of Greenpeace (1)

15. UK police’s “undercover work against political activists” and “subversives” — including ex-home secretary Jack Straw (1)

16. UK undercover policeman Peter Francis “undermined the campaign of a family who wanted justice over the death of a boxing instructor who was struck on the head by a police baton” (1)

17. UK undercover police secretly gathered intelligence on 18 grieving families fighting to get justice from police (1, 2)

18. UK undercover police spied on lawyer for family of murdered black teenager Stephen Lawrence; police also secretly recorded friend of Lawrence and his lawyer (1, 2)

19. UK undercover police spied on human rights lawyers Bindmans (1)

20. GCHQ accused of spying on Scottish trade unions (1)

2000s

21. US military spied on gay rights groups opposing “don’t ask, don’t tell” (1)

22. Maryland State Police monitored nonviolent gay rights groups as terrorist threat (1)

23. NSA monitoring email of American citizen Faisal Gill, including while he was running as Republican candidate for Virginia House of Delegates (1)

24. NSA surveillance of Rutgers professor Hooshang Amirahmadi and ex-California State professor Agha Saeed (1)

25. NSA tapped attorney-client conversations of American lawyer Asim Ghafoor (1)

26. NSA spied on American citizen Nihad Awad, executive director of the Council on American-Islamic Relations, the USA’s largest Muslim civil rights organization (1)

27. NSA analyst read personal email account of Bill Clinton (date unknown) (1)

28. Pentagon counterintelligence unit CIFA monitored peaceful antiwar activists (1)

29. Green party peer and London assembly member Jenny Jones was monitored and put on secret police database of “domestic extremists” (1, 2)

30. MI5 and UK police bugged member of parliament Sadiq Khan (1, 2)

31. Food Not Bombs (volunteer movement giving out free food and protesting against war and poverty) labelled as terrorist group and infiltrated by FBI (1, 2, 3)

32. Undercover London police infiltrated green activist groups (1)

33. Scottish police infiltrated climate change activist organizations, including anti-airport expansion group Plane Stupid (1)

34. UK undercover police had children with activists in groups they had infiltrated (1)

35. FBI infiltrated Muslim communities and pushed those with objections to terrorism (and often mental health problems) to commit terrorist acts (1, 2, 3)

2010s

36. California gun owners’ group Calguns complains of chilling effect of NSA surveillance on members’ activities (1, 2, 3)

37. GCHQ and NSA surveilled Unicef and head of Economic Community of West African States (1)

38. NSA spying on Amnesty International and Human Rights Watch (1)

39. CIA hacked into computers of Senate Intelligence Committee, whose job it is to oversee the CIA
(1, 2, 3, 4, 5, 6; bonus: watch CIA director John Brennan lie that it didn’t happen, months before apologizing)

40. CIA obtained legally protected, confidential email between whistleblower officials and members of congress, regarding CIA torture programme (1)

41. Investigation suggests that CIA “operates an email surveillance program targeting senate intelligence staffers” (1)

42. FBI raided homes and offices of Anti-War Committee and Freedom Road Socialist Organization, targeting solidarity activists working with Colombians and Palestinians (1)

43. Nearly half of US government’s terrorist watchlist consists of people with no recognized terrorist group affiliation (1)

44. FBI taught counterterrorism agents that mainstream Muslims are “violent” and “radical”, and used presentations about the “inherently violent nature of Islam” (1, 2, 3)

45. GCHQ has developed tools to manipulate online discourse and activism, including changing outcomes of online polls, censoring videos, and mounting distributed denial of service attacks (1, 2)

46. Green member of parliament Caroline Lucas complains that GCHQ is intercepting her communications (1)

47. GCHQ collected IP addresses of visitors to Wikileaks websites (1, 2)

48. The NSA tracks web searches related to privacy software such as Tor, as well as visitors to the website of the Linux Journal (calling it an “extremist forum”) (1, 2, 3)

49. UK police attempt to infiltrate anti-racism, anti-fascist and environmental groups, anti-tax-avoidance group UK Uncut, and politically active Cambridge University students (1, 2)

50. NSA surveillance impedes work of investigative journalists and lawyers (1, 2, 3, 4, 5).

Back to mathematics

As mathematicians, we spend much of our time studying objects that don’t exist anywhere in the world (perfect circles and so on). But we exist in the world. So, being a mathematician sometimes involves addressing real-world concerns.

For instance, Vancouver mathematician Izabella Laba has for years been writing thought-provoking posts on sexism in mathematics. That’s not mathematics, but it’s a problem that implicates every mathematician. On this blog, John Baez has written extensively on the exploitative practices of certain publishers of mathematics journals, the damage it does to the universities we work in, and what we can do about it.

I make no apology for bringing political considerations onto a mathematical blog. The NSA is a huge employer of mathematicians — over 1000 of us, it claims. Like it or not, it is part of our mathematical environment. Both the American Mathematical Society and London Mathematical Society are now regularly publishing articles on the role of mathematicians in enabling government surveillance, in recognition of our responsibility for it. As a recent New York Times article put it:

To say mathematics is political is not to diminish it, but rather to recognize its greater meaning, promise and responsibilities.

BackreactionThe ordinary weirdness of quantum mechanics

Raymond Laflamme's qubit.
Photo: Christina Reed.
I’m just back from our 2014 Workshop for Science Writers, this year on the topic “Quantum Theory”. The meeting was both inspiring and great fun - the lab visit wasn’t as disorganized as last time, the muffins appeared before the breaks and not after, and amazingly enough we had no beamer fails. We even managed to find a video camera, so hopefully you’ll be able to watch the lectures on your own once uploaded, provided I pushed the right buttons.

Due to popular demand, we included a discussion session this year. You know that I’m not exactly a big fan of discussion sessions, but then I didn’t organize this meeting for myself. Michael Schirber volunteered to moderate the discussion. He started with posing the question why quantum mechanics is almost always portrayed as spooky, strange or weird. Why do we continue to do this and is beneficial for communicating the science behind the spook?

We could just blame Einstein for this, since he famously complained that quantum mechanics seemed to imply a spooky (“spukhafte”) action at a distance, but that was a century ago and we learned something since. Or some of us anyway.

Stockholm's quantum optics lab,
Photo: Christina Reed.
We could just discard it as headline making, a way to generate interest, but that doesn’t really explain why quantum mechanics is described as weirder or stranger as other new and often surprising effects. How is time-dilatation in a gravitational field less strange than entanglement? And it’s not that quantum mechanics is particularly difficult either. As Chad pointed out during the discussion, much of quantum mechanics is technically much simpler than general relativity.

We could argue it is due to our daily life being dominated by classical physics, so that quantum effects must appear unintuitive. Intuition however is based on experience and exposure. Spend some time calculating quantum effects, spend some time listening to lectures about quantum mechanics, and you can get that experience. This does not gain you the ability to perceive quantum effects without a suitable measuring device, but that is true for almost everything in science.

The explanation that came up during the discussion that made the most sense to me is that it’s simply a way to replace technical vocabulary, and these placeholders have become vocabulary on their own right.

The spook and the weirdness, they stand in for non-locality and contextuality, they replace correlations and entanglement, pure and mixed states, non-commutativity, error correction, path integrals or post-selection. Unfortunately, all too often the technical vocabulary is entirely absent rather than briefly introduced. This makes it very difficult for interested readers to dig deeper into the topic. It is basically a guarantee that the unintuitive quantum behavior will remain unintuitive for most people. And for the researchers themselves, the lack of technical terms makes it impossible to figure out what is going on. The most common reaction to supposed “quantum weirdness” that I see among my colleagues is “What’s new about this?”

The NYT had a recent opinion piece titled “Why We Love What We Don’t Understand” in which Anna North argued that we like that what isn’t understood because we want to keep the wonder alive:

“Many of us may crave that tug, the thrill of something as-yet-unexplained… We may want to get to the bottom of it, but in another way, we may not — as long as we haven’t quite figured everything out, we can keep the wonder alive.”
This made me think because I recall browsing through my mother’s collection of (the German version of) Scientific American as a teenager, always looking to learn what the scientists, the big brains, did not know. Yeah, it was kinda predictable I would end up in some sort of institution. At least it’s one where I have a key to the doors.

Anyway, I didn’t so much want to keep the mystery alive as that I wanted to know where the boundary between knowledge and mystery was currently at. Assume for a moment I’m not all that weird but most likely average. It is surprising then that the headline-grabbing quantum weirdness, instead of helping the reader, misleads them about where this boundary between knowledge and mystery is? Is it surprising then that everybody and their dog has solved some problem with quantum mechanics without knowing what problem?

And is it surprising, as I couldn’t help noticing, that the lecturers at this year’s workshop were all well practiced in forward-defense, and repeatedly emphasized that most of the theory is extremely well understood. It’s just that the focus on new technics and recent developments highlights exactly that what isn’t (yet) well understood, thereby giving more weight to the still mysterious in the news than there is in the practice.

I myself do not mind the attention-grabbing headlines, and that news focus on that what’s new rather than that what’s been understood for decades is the nature of the business. As several science writers, at this workshop and also at the previous one, told me, it is often not them inventing the non-technical terms, but it is vocabulary that the scientists themselves use to describe their research. I suspect though the scientists use it trying to adapt their explanations to the technical level they find in the popular science literature. So who is to blame really and how do we get out of this loop?

A first step might be to stop assuming all other parties are more stupid than the own. Most science writers have some degree in science, and they are typically more up to date on what is going on in research than the researchers themselves. The “interested public” is perfectly able to deal with some technical vocabulary as long as it comes with an explanation. And researchers are not generally unwilling or unable to communicate science, they just often have no experience what is the right level of detail in situations they do not face every day.

When I talk to some journalist, I typically ask them first to tell me roughly what they already know. From their reply I can estimate what background they bring, and then I build on that until I notice I lose them. Maybe that’s not a good procedure, but it’s the best I’ve come up with so far.

We all can benefit from better science communication, and a lot has changed within the last decades. Most notably, there are many more voices to hear now, and these voices aim at very different levels of knowledge. What is still not working very well though is the connection between different levels of technical detail. (Which we previously discussed here.)

At the end of the discussion I had the impression opinions were maximally entangled and pure states might turn into mixed ones. Does that sound strange?

n-Category Café Uncountably Categorical Theories

Right now I’d love to understand something a logician at Oxford tried to explain to me over lunch a while back. His name is Boris Zilber. He’s studying what he informally calls ‘logically perfect’ theories — that is, lists of axioms that almost completely determine the structure they’re trying to describe. He thinks that we could understand physics better if we thought harder about these logically perfect theories:

His ways of thinking, rooted in model theory, are quite different from anything I’m used to. I feel a bit like Gollum here:

A zeroth approximation to Zilber’s notion of ‘logically perfect theory’ would be a theory in first-order logic that’s categorical, meaning all its models are isomorphic. In rough terms, such a theory gives a full description of the mathematical structure it’s talking about.

The theory of groups is not categorical, but we don’t mind that, since we all know there are lots of very different groups. Historically speaking, it was much more upsetting to discover that Peano’s axioms of arithmetic, when phrased in first-order logic, are not categorical. Indeed, Gödel’s first incompleteness theorem says there are many statements about natural numbers that can neither be proved nor disproved starting from Peano’s axioms. It follows that for any such statement we can find a model of the Peano axioms in which that statement holds, and also a model in which it does not. So while we may imagine the Peano axioms are talking about ‘the’ natural numbers, this is a false impression. There are many different ‘versions’ of the natural numbers, just as there are many different groups.

The situation is not so bad for the real numbers — at least if we are willing to think about them in a somewhat limited way. There’s a theory of a real closed field: a list of axioms governing the operations +,×,0+, \times, 0 and 11 and the relation \le. Tarski showed this theory is complete. In other words, any sentence phrased in this language can either be proved or disproved starting from the axioms.

Nonetheless, the theory of real closed fields is not categorical: besides the real numbers, there are many ‘nonstandard’ models, such as fields of hyperreal numbers where there are numbers bigger than 11, 1+11+1, 1+1+11+1+1, 1+1+1+11+1+1+1 and so on. These models are all elementarily equivalent: any sentence that holds in one holds in all the rest. That’s because the theory is complete. But these models are not all isomorphic: we can’t get a bijection between them that preserves +,×,0,1+, \times, 0, 1 and \le.

Indeed, only finite-sized mathematical structures can be ‘nailed down’ up to isomorphism by theories in first-order logic. After all, the Löwenheim–Skolem theorem says that if a first-order theory in a countable language has an infinite model, it has at least one model of each infinite cardinality. So, if we’re trying to use this kind of theory to describe an infinitely big mathematical structure, the most we can hope for is that after we specify its cardinality, the axioms completely determine it.

And this actually happens sometimes. It happens for the complex numbers! Zilber believes this has something to do with why the complex numbers show up so much in physics. This sounds very implausible at first, but there are some amazing results in logic that one needs to learn before dismissing the idea out of hand.

Say κ\kappa is some cardinal. A first-order theory describing structure on a single set is called κ-categorical if it has a unique model of cardinality κ\kappa. And in 1965, a logician named Michael Morley showed that if a list of axioms is κ\kappa-categorical for some uncountable κ\kappa, it’s κ\kappa-categorical for every uncountable κ\kappa. I have no idea why this is true. But such theories are called uncountably categorical.

A great example is the theory of an algebraically closed field of characteristic zero.

When you think of algebraically closed fields of characteristic zero, the first example that comes to mind is the complex numbers. These have the cardinality of the continuum. But because this theory is uncountably categorical, there is exactly one algebraically closed field of characteristic zero of each uncountable cardinality… up to isomorphism.

This implies some interesting things. For example, we can take the complex numbers, throw in an extra element, and let it freely generate a bigger algebraically closed field. It’s ‘bigger’ in the sense that it contains the complex numbers as a proper subset, indeed a subfield. But since it has the same cardinality as the complex numbers, it’s isomorphic to the complex numbers!

And then, because this ‘bigger’ field is isomorphic to the complex numbers, we can turn this argument around. We can take the complex numbers, remove a lot of carefully chosen elements, and get a subfield that’s isomorphic to the complex numbers.

Or, if we like, we can take the complex numbers, adjoin a really huge set of extra elements, and let them freely generate an algebraically closed field of characteristic zero. The cardinality of this field can be as big as we want. It will be determined up to isomorphism by its cardinality. But it will be elementarily equivalent to the ordinary complex numbers! In other words, all the same sentences written in the language of +,×,0+, \times, 0 and 11 will hold. See why?

The theory of a real closed field is not uncountably categorical. This implies something really strange. Besides the ‘usual’ real numbers \mathbb{R} there’s another real closed field \mathbb{R}', not isomorphic to \mathbb{R}, with the same cardinality. We can build the complex numbers \mathbb{C} using pairs of real numbers. We can use the same trick to build a field \mathbb{C}' using pairs of guys in \mathbb{R}'. But it’s easy to check that this funny field \mathbb{C}' is algebraically closed and of characteristic zero. So, it’s isomorphic to \mathbb{C}.

In short, different ‘versions’ of the real numbers can give rise to the same version of the complex numbers! This is stuff they didn’t teach me in school.

All this is just background.

To a first approximation, Zilber considers uncountably categorical theories ‘logically perfect’. Let me paraphrase him:

There are purely mathematical arguments towards accepting the above for a definition of perfection. First, we note that the theory of the field of complex numbers (in fact any algebraically closed field) is uncountably categorical. So, the field of complex numbers is a perfect structure, and so are all objects of complex algebraic geometry by virtue of being definable in the field.

It is also remarkable that Morley’s theory of categoricity (and its extensions) exhibits strong regularities in models of categorical theories generally. First, the models have to be highly homogeneous, in a sense technically different from the one discussed for manifolds, but similar in spirit. Moreover, a notion of dimension (the Morley rank) is applicable to definable subsets in uncountably categorical structures, which gives one a strong sense of working with curves, surfaces and so on in this very abstract setting. A theorem of the present author states more precisely that an uncountably categorical structure MM is either reducible to a 2-dimensional “pseudo-plane” with at least a 2-dimensional family of curves on it (so is non-linear), or is reducible to a linear structure like an (infinite dimensional) vector space, or to a simpler structure like a GG-set for a discrete group GG. This led to a Trichotomy Conjecture, which specifies that the non-linear case is reducible to algebraically closed fields, effectively implying that MM in this case is an object of algebraic geometry over an algebraically closed field.

I don’t understand this, but I believe that in rough terms this would amount to getting ahold of algebraic geometry from purely ‘logical’ principles, not starting from ideas in algebra or geometry!

Ehud Hrushovski showed that the Trichotomy Conjecture is false. However, Zilber has bounced back with a new improved notion of logically perfect theory, namely a ‘Noetherian Zariski theory’. This sounds like something out of algebraic geometry, but it’s really a concept from logic that takes advantage of the eerie parallels between structures defined by uncountably categorical theories and algebraic geometry.

Models of Noetherian Zariski theories include not only structures from algebraic geometry, but also from noncommutative algebraic geometry, like quantum tori. So, Zilber is now trying to investigate the foundations of physics using ideas from model theory. It seems like a long hard project that’s just getting started.

Here’s a concrete conjecture that illustrates how people are hoping algebraic geometry will spring forth from purely logical principles:

The Algebraicity Conjecture. Suppose GG is a simple group whose theory (consisting of all sentences in first-order theory of groups that hold for this group) is uncountably categorical. Then G=𝔾(K)G = \mathbb{G}(K) for some simple algebraic group 𝔾\mathbb{G} and some algebraically closed field KK.

Zilber has a book on these ideas:

But there are many prerequisites I’m missing, and Richard Elwes, who studied with Zilber, has offered me some useful pointers:

If you want to really understand the Geometric Stability Theory referred to in your last two paragraphs, there’s a good (but hard!) book by that name by Anand Pillay. But you don’t need to go anywhere near that far to get a good idea of Morley’s Theorem and why the complex numbers are uncountably categorical. These notes look reasonable:

Basically the idea is that a theory is uncountably categorical if and only if two things hold: firstly there is a sensible notion of dimension (Morley rank) which can be assigned to every formula quantifying its complexity. In the example of the complex numbers Morley rank comes out to be pretty much the same thing as Zariski dimension. Secondly, there are no ‘Vaughtian pairs’ meaning, roughly, two bits of the structure whose size can vary independently. (Example: take the structure consisting of two disjoint non-interacting copies of the complex numbers. This is not uncountably categorical because you could set the two cardinalities independently.)

It is not too hard to see that the complex numbers have these two properties once you have the key fact of ‘quantifier elimination’, i.e. that any first order formula is equivalent to one with no quantifiers, meaning that all they can be are sets determined by the vanishing or non-vanishing of various polynomials. (Hence the connection to algebraic geometry.) In one dimension, basic facts about complex numbers tell us that every definable subset of \mathbb{C} must therefore be either finite or co-finite. This is the definition of a strongly minimal structure, which automatically implies both of the above properties without too much difficulty. So the complex numbers are not merely ‘perfect’ (though I’ve not heard this term before) but are the very best type of structure even among the uncountably categorical.

If you know anything else that could help me out, I’d love to hear it!

August 30, 2014

John PreskillCaltech InnoWorks 2014, More Than Just a Summer Camp

“More, we need more!”

Adding more fuel to “Red October”, I presented the final product to my teammates. With a communal nod of approval, we rushed over to the crowd.

“1, 2, 3, GO!”

It was the semi-finals. Teams Heil Hydra! and The Archimedean Hawks ignited their engines and set their vehicles onto the starting line. Nascar? F1? Nope, even better. Homemade steamboat races! Throughout the cheers and yelling, we discovered that more isn’t better. Flames were devouring Team Heil Hydra!’s Red October. Down went the ship. Despite the loss, the kids learned about steam as a source of energy, experimentation, and teamwork. Although it may have been hard to tell the first day, by the end of this fourth day of the camp, all students were visibly excited for another day of the InnoWorks summer program at Caltech.

photo (1)

What is InnoWorks? An engaging summer program aimed for middle school students with disadvantaged backgrounds, InnoWorks offers a free of charge opportunity to dig into the worlds of science, technology, engineering, mathematics, and medicine (STEM^2). In his own life experience, William Hwang (founder of InnoWorks) was blessed with the opportunities to attend several summer camps throughout his childhood, but he had a friend who did not share the same opportunities. Sparked with the desire to start something, Hwang founded the non-profit organization, United InnoWorks Academy. With the first program to begin in 2004, the InnoWorks Academy developed these summer programs to help provide underprivileged kids with hands-on activities, team-building activities, and fast-paced competitive missions. Starting with just 34 students and 17 volunteers in a single chapter, InnoWorks has now grown to more than a dozen university chapters that have hosted above 60 summer programs for 2,200 middle school students, all done with the help of over 1000 volunteers.

Monday, August 11th, 2014 marked the first day of Caltech’s 3rd annual summer InnoWorks program. Last year, my younger brother participated in the program and had such a great experience that he wanted to become a junior mentor this year. After researching the program and listening to my brother’s past experiences, I was ecstatic to accept this journey as a mentor for Caltech’s InnoWorks program. Allow me to take you on a ride through my team’s and my own experience of InnoWorks.

First Day of Caltech InnoWorks 2014. My first team member that I checked in was Elliot. “Are you ready for InnoWorks!?” Perhaps I was a little overly excited. I received a shrug and “What’re we eating for breakfast?” Not the response I was hoping for, but that was going to change. As the rest of my team, which included Frank, Megan, Ethan, and my junior mentor, Elan, arrived, I began peppering them with icebreakers left and right. Soon enough, we dubbed ourselves Heil Hydra! and by the end of the second day, I couldn’t get them to be quiet.

“What are we doing next?”

“Guys, GUYS! Let’s use the green pom-poms as chloroplasts.”

“Hey, the soap actually smells good.”

“Hm. If you add another rubber band, the cup won’t vibrate as much, and it makes a lower sound.”

Sometimes they would have endless questions, which was great! Isn’t that what science is all about?

InnoWorks-35

Most of the days during camp were themed with a specific subject, including biology, chemistry, physics, and engineering. Before each activity, both mentors and junior mentors gave a brief, prepared introduction to the science used during the experimentation. Here’s a quick synopsis of some of the activities and the students’ experiences:

Camera Obscura. After a short explanation of light, and how a lens works, we split the room up into 3 groups to build their very own camera obscura, which is an optical device that projects an image of its surroundings onto a screen (or in our case, the ground). Using a mirror, a magnifying glass, some PVC piping, and a black tarp, the kids constructed a camera obscura. I was impressed by how many students encumbered the heat of the black tarp and concrete all in the name of science.

Build Your Own Instrument. The title says all. I let my junior mentor, Elan, lead the group in this activity. Tasked with creating an instrument based on accurate pitch of 3 whole note tones, creativity, efficiency, and performance, the students went straight to work. Children have endless imaginations. Give kids PVC pipes, rubber bands, balloons, cups, and paper clips, and they’ll make everything! Working together, the groups created an instrument (often more than one) to present in front of everyone. Teams were required to explain how their instrument created sound (vibrations), and attempt to play “Mary Had a Little Lamb” (which most succeeded). I came across paperclip rain sticks, PVC didgeridoos, test tube pan flutes, red solo cup drums, and even PVC balloon catapults and rubber band ballistas!

photo

Liquid Nitrogen. One of the highlights of the camp was liquid nitrogen! We were very honored to have Glen Evenbly and Olivier Landon-Cardinal, IQIM postdocs, join us. After pouring the liquid into a bowl, Glen showed the kids how nitrogen gas enveloped the area. Liquid nitrogen’s efficiency as a coolant is limited by the fact that it boils immediately upon contact with a warmer object, surrounding the object with nitrogen gas on which the liquid surfs. This effect is known as the Leidenfrost effect, which applies to any liquid in contact with an object significantly hotter than its boiling point. 

InnoWorks-38InnoWorks-40 

However, liquid nitrogen is still extremely cold, and when roses were placed into the bowl with liquid nitrogen, the pedals froze right before everyone’s eyes.

Lego Mindstorms. The last activity of the camp was building a lego robot and programming it to track and follow a black tape trail using its light sensor. Since each of my team members had experience with these lego kits, they went to work right away. Two of my students worked on building the robot, while the other two retrieved the pieces. After awhile, they prompted each other to switch roles.

InnoWorks-3

Programming the robot was a struggle, but manipulating the code and watching the aftermath was all part of the experiment. After many attempted tries, the group was unable to accurately get the robot to follow the black line (some groups were successful!). However, without any outside help (including myself), Team Heil Hydra! programmed the robot to move and sing (can you guess?) “Mary Had a Little Lamb”. Teamwork for the win! Team spirit bloomed in my group – each day of camp my InnoWorkers agreed on a matching t-shirt color. As a mentor, I could not have been more proud.

InnoWorks-49

I know that I am not only speaking for myself when I say that the InnoWorks family, the students, and the program itself has burrowed its way into my heart. I have watched these students develop teamwork skills, enthusiasm for learning new things, and friendships. I have heard these students speak the minimal amount on their first day, only to find that their chatterboxes won’t stop the last day. To overlook InnoWorks as just a science camp where students come to learn about science is an understatement. InnoWorks is where students experience, engage, and conduct science, where they learn not just about science, but also about collaboration, leadership, and innovation. 

I must end on this last note: Heil Hydra! 

Editor’s Note: Ms. Rebekah Zhou is majoring in mathematics at CSU Fresno. In her spare time, she enjoys teaching piano and tutoring.


August 29, 2014

David Hoggexoclocks: period and phase precision

It is back to the non-research grind now, but that didn't stop me from stealing a couple hours to work on Foreman-Mackey and my "stellar clocks" project. My question was: How precisely can you phase up or measure the time using a transiting hot Jupiter. That is, how good is a hot Jupiter as a clock? If good, we could use it for all kinds of fun things.

I started with fake data. As my loyal reader knows, I am against using fake data; all it provides is a functional test of your code! But over the years I have come around: After all, it provides a functional test of your code! It also permits you to check your results with absolutely known input assumptions; that is, it provides sanity checking on your estimates of (in this case) the precision you expect.

I find, with extremely optimistic assumptions (about the period, depth, duration, ingress time, gravitational dynamics, and, of course, photon noise), that a high-quality transiting hot Jupiter in Kepler normal-cadence data could, over the lifetime of the now-over Kepler mission, provide a time standard that is good to a few tenths of a second. This agrees with the order-of-magnitude calculation I had made myself earlier and suggests that we might be able to do some timing experiments.

Sean CarrollShould Scientific Progress Affect Religious Beliefs?

Sure it should. Here’s a new video from Closer to Truth, in which I’m chatting briefly with Robert Lawrence Kuhn about the question. “New” in the sense that it was just put on YouTube, although we taped it back in 2011. (Now my formulations would be considerably more sophisticated, given the wisdom that comes with age).

It’s interesting that the “religious beliefs are completely independent of evidence and empirical investigation” meme has enjoyed such success in certain quarters that people express surprise to learn of the existence of theologians and believers who still think we can find evidence for the existence of God in our experience of the world. In reality, there are committed believers (“sophisticated” and otherwise) who feel strongly that we have evidence for God in the same sense that we have evidence for gluons or dark matter — because it’s the best way to make sense of the data — just as there are others who think that our knowledge of God is of a completely different kind, and therefore escapes scientific critique. It’s part of the problem that theism is not well defined.

One can go further than I did in the brief clip above, to argue that any notion of God that can’t be judged on the basis of empirical evidence isn’t much of a notion at all. If God exists but has no effect on the world whatsoever — the actual world we experience could be precisely the same even without God — then there is no reason to believe in it, and indeed one can draw no conclusions whatsoever (about right and wrong, the meaning of life, etc.) from positing it. Many people recognize this, and fall back on the idea that God is in some sense necessary; there is no possible world in which he doesn’t exist. To which the answer is: “No he’s not.” Defenses of God’s status as necessary ultimately come down to some other assertion of a purportedly-inviolable metaphysical principle, which can always simply be denied. (The theist could win such an argument by demonstrating that the naturalist’s beliefs are incoherent in the absence of such principles, but that never actually happens.)

I have more sympathy for theists who do try to ground their belief in evidence, rather than those who insist that evidence is irrelevant. At least they are playing the game in the right way, even if I disagree with their conclusions. Despite what Robert suggests in the clip above, the existence of disagreement among smart people does not imply that there is not a uniquely right answer!

Quantum DiariesCoffee and Code (Part Deux)

Further to my entry on the CERN Summer Student Webfest 2014 posted a few weeks ago (here), there is a short video about the event available to view HERE. Enjoy!

Youtube

David Hoggwords on a plane

While in transit back to NYC, I worked on writing more about Ness and my quadratic-in-tags model for stellar spectra in our manuscript, and she got the quadratic model to work from end to end. It looks extremely promising.

Clifford JohnsonAnd Back…

subway_sketches_27_08_2014It is a new semester, and a new academic year. So this means getting back into the routine of lots of various aspects of the standard professor gig. For me this also involves being back in LA and taking the subway, and so this means getting (when it is not too busy as it seems to get a lot now) to sketch people. The guy with the red sunglasses was on his way to USC as well, and while he was reading or playing a game on his phone (such things are a blessing to sketchers... they help people hold in a fixed position for good stretches of time) I was able to get a quick sketch done in a few stops on the Expo line before we all got off. The other guy with the oddly trimmed beard was just briefly seen on the Red line, and so I did not get to make much of him... I'm teaching electromagnetism at graduate level again this semester, and so it ought to be fun, given that it is such a fun topic. I hope that the group of [...] Click to continue reading this post

Chad OrzelNordita Workshop for Science Writers, Day Two

The second day of the “Quantum Boot Camp” was much lighter on talks. The only speaker was Ray Laflamme from the Institute for Quantum Computing in Waterloo, who gave a nice introduction to quantum technologies. While he did spend a bit of time at the start going through Shor’s algorithm for factoring numbers (following up a discussion from Wednesday), he mostly focused on ways to use quantum physics to improve sensors of technological interest.

So, for example, he talked about how efforts to develop techniques for error-correcting codes in liquid state NMR quantum computing led to the development of better sensors for drilling oil wells. The techniques needed to protect against inhomogeneities during information processing also help pick out small signals when doing NMR of rocks outside a potential oil well, to help determine the composition and porosity of the rock. It’s a nice example of indirect benefits of basic research. He also talked about using nitrogen vacancies in diamond for magnetic sensing, another very promising application of quantum effects to problems that aren’t the factoring of large numbers. Ray also brought items to pass around: a chip with some superconducting qubits on it, and a small piece of artificial diamond with NV centers. Those were a big hit.

In the afternoon, we got a tour of the quantum optics labs at KTH, from Mohamed Bourennane, who had the group do a Bell’s Inequality test with down-converted photons. The lasers and detectors were all set up, but he had visitors change the detector settings to do the necessary measurements; the tour group I was part of got close to the maximum possible violation of the local hidden variable limit (2.7 out of 2.828…), so quantum worked really well for us.

The last session of the day was a fairly wide-ranging discussion of issues with communicating quantum mechanics to the general public, which mostly talked about whether people who wrte about quantum over-emphasize the weirder aspects. The problem, of course, is that weirdness sells, and it’s hard to find the right balance between mentioning enough of the exotic aspects to draw people in without being gratuitously confusing. At one point, I felt compelled to channel Matt Leifer and speak up for the importance of thinking about quantum interpretations (on the grounds that particular views of what’s “really” going on with the math can be helpful in suggesting particular lines of experiment that wouldn’t seem productive to people favoring a different interpretation). So if reality felt a little wobbly yesterday afternoon, that’s probably my fault…

We didn’t come up with any foolproof solutions, alas, but it was a fun discussion. My talk is the middle of three today, so I’m likely to be a bit distracted for the morning session. Especially since I just thought of a joke I need to add to my slides…

Doug NatelsonTwo cool papers on the arxiv

The beginning of the semester is a crazy time, so blogging is a little light right now.  Still, here are a couple of recent papers from the arxiv that struck my fancy.

arxiv:1408.4831 - "Self-replicating cracks:  A collaborative fracture mode in thin films," by Marthelot et al
This is very cool classical physics.  In thin, brittle films moderately adhering to a substrate, there can be a competition between the stresses involved with crack propagation and the stresses involved with delamination of the film.  The result can be very pretty pattern formation and impressively rich behavior.  A side note:  All cracks are really nanoscale phenomena - the actual breaking of bonds at the tip of the propagating crack is firmly in the nano regime.

arxiv:1408.6496 - "Non-equilibrium probing of two-level charge fluctuators using the step response of a single electron transistor," by Pourkabirian et al
I've written previously (wow, I've been blogging a while) about "two-level systems", the local dynamic degrees of freedom that are ubiquitous in disordered materials.  These little fluctuators have a statistically broad distribution of level asymmetries and tunneling times.  As a result, when perturbed, the ensemble of these TLSs responds not with a simple exponential decay (as would a system with a single characteristic time scale).  Instead, the TLS ensemble leads to a decaying response that is logarithmic in time.  For my PhD I studied such (agonizingly) slow relaxations in the dielectric response and acoustic response of glasses (like SiO2) at cryogenic temperatures.   Here, the authors use the incredible charge sensitivity of a single-electron transistor (SET) to look at the relaxation of the local charge environment near such disordered dielectrics.  The TLSs often have electric dipole moments, so their relaxation changes the local electrostatic potential near the SET. Guess what:  logarithmic relaxations.  Cute, and brings back memories of loooooong experiments from grad school.

Clifford JohnsonMaking and Baking….

Back in LA, I had an amusing day the other day going from this* in the TV studio... photo_laser_mirage_shoot_small involving a laser and liquid nitrogen (so, around -320 F, if you must use those units), to this in the kitchen: tasty_things_1 involving butter, flour, water and shortening... (and once in the oven, around +350 F) which ultimately resulted in this: [...] Click to continue reading this post

August 28, 2014

Scott AaronsonDo theoretical computer scientists despise practitioners? (Answer: no, that’s crazy)

A roboticist and Shtetl-Optimized fan named Jon Groff recently emailed me the following suggestion for a blog entry:

I think a great idea for an entry would be the way that in fields like particle physics the theoreticians and experimentalists get along quite well but in computer science and robotics in particular there seems to be a great disdain for the people that actually do things from the people that like to think about them. Just thought I’d toss that out there in case you are looking for some subject matter.

After I replied (among other things, raising my virtual eyebrows over his rosy view of the current state of theoretician/experimentalist interaction in particle physics), Jon elaborated on his concerns in a subsequent email:

[T]here seems to be this attitude in CS that getting your hands dirty is unacceptable. You haven’t seen it because you sit a lofty heights and I tend to think you always have. I have been pounding out code since ferrite cores. Yes, Honeywell 1648A, so I have been looking up the posterior of this issue rather than from the forehead as it were. I guess my challenge would be to find a noteworthy computer theoretician somewhere and ask him:
1) What complete, working, currently functioning systems have you designed?
2) How much of the working code did you contribute?
3) Which of these systems is still operational and in what capacity?
Or say, if the person was a famous robotics professor or something you may ask:
1) Have you ever actually ‘built’ a ‘robot’?
2) Could you, if called upon, design and build an easily tasked robot safe for home use using currently available materials and code?

So I wrote a second reply, which Jon encouraged me to turn into a blog post (kindly giving me permission to quote him).  In case it’s of interest to anyone else, my reply is below.


Dear Jon,

For whatever it’s worth, when I was an undergrad, I spent two years working as a coder for Cornell’s RoboCup robot soccer team, handling things like the goalie.  (That was an extremely valuable experience, one reason being that it taught me how badly I sucked at meeting deadlines, documenting my code, and getting my code to work with other people’s code.)   Even before that, I wrote shareware games with my friend Alex Halderman (now a famous computer security expert at U. of Michigan); we made almost $30 selling them.  And I spent several summers working on applied projects at Bell Labs, back when that was still a thing.  And by my count, I’ve written four papers that involved code I personally wrote and experiments I did (one on hypertext, one on stylometric clusteringone on Boolean function query properties, one on improved simulation of stabilizer circuits—for the last of these, the code is actually still used by others).  While this is all from the period 1994-2004 (these days, if I need any coding done, I use the extremely high-level programming language called “undergrad”), I don’t think it’s entirely true to say that I “never got my hands dirty.”

But even if I hadn’t had any of those experiences, or other theoretical computer scientists hadn’t had analogous ones, your questions still strike me as unfair.  They’re no more fair than cornering a star coder or other practical person with questions like, “Have you ever proved a theorem?  A nontrivial theorem?  Why is BPP contained in P/poly?  What’s the cardinality of the set of Turing-degrees?”  If the coder can’t easily answer these questions, would you say it means that she has “disdain for theorists”?  (I was expecting some discussion of this converse question in your email, and was amused when I didn’t find any.)

Personally, I’d say “of course not”: maybe the coder is great at coding, doesn’t need theory very much on a day-to-day basis and doesn’t have much free time to learn it, but (all else equal) would be happy to know more.  Maybe the coder likes theory as an outsider, even has friends from her student days who are theorists, and who she’d go to if she ever did need their knowledge for her work.  Or maybe not.  Maybe she’s an asshole who looks down on anyone who doesn’t have the exact same skill-set that she does.  But I certainly couldn’t conclude that from her inability to answer basic theory questions.

I’d say just the same about theorists.  If they don’t have as much experience building robots as they should have, don’t know as much about large software projects as they should know, etc., then those are all defects to add to the long list of their other, unrelated defects.  But it would be a mistake to assume that they failed to acquire this knowledge because of disdain for practical peoplerather than for mundane reasons like busyness or laziness.

Indeed, it’s also possible that they respect practical people all the more, because they tried to do the things the practical people are good at, and discovered for themselves how hard they were.  Maybe they became theorists partly because of that self-discovery—that was certainly true in my case.  Maybe they’d be happy to talk to or learn from a practical roboticist like yourself, but are too shy or too nerdy to initiate the conversation.

Speaking of which: yes, let’s let bloom a thousand collaborations between theorists and practitioners!  Those are the lifeblood of science.  On the other hand, based on personal experience, I’m also sensitive to the effect where, because of pressures from funding agencies, theorists have to try to pretend their work is “practically relevant” when they’re really just trying to discover something cool, while meantime, practitioners have to pretend their work is theoretically novel or deep, when really, they’re just trying to write software that people will want to use.  I’d love to see both groups freed from this distorting influence, so that they can collaborate for real reasons rather than fake ones.

(I’ve also often remarked that, if I hadn’t gravitated to the extreme theoretical end of computer science, I think I might have gone instead to the extreme practical end, rather than to any of the points in between.  That’s because I hate the above-mentioned distorting influence: if I’m going to try to understand the ultimate limits of computation, then I should pursue that wherever it leads, even if it means studying computational models that won’t be practical for a million years.  And conversely, if I’m going to write useful software, I should throw myself 100% into that, even if it means picking an approach that’s well-understood, clunky, and reliable over an approach that’s new, interesting, elegant, and likely to fail.)

Best,
Scott

Sesh NadathurA Supervoid cannot explain the Cold Spot

In my last post, I mentioned the claim that the Cold Spot in the cosmic microwave background is caused by a very large void — a "supervoid" — lying between us and the last scattering surface, distorting our vision of the CMB, and I promised to say a bit more about it soon. Well, my colleagues (Mikko, Shaun and Syksy) and I have just written a paper about this idea which came out on the arXiv last week, and in this post I'll try to describe the main ideas in it.

First, a little bit of background. When we look at sky maps of the CMB such as those produced by WMAP or Planck, obviously they're littered with very many hot and cold spots on angular scales of about one degree, and a few larger apparent "structures" that are discernible to the naked eye or human imagination. However, as I've blogged about before, the human imagination is an extremely poor guide to deciding whether a particular feature we see on the sky is real, or important: for instance, Stephen Hawking's initials are quite easy to see in the WMAP CMB maps, but this doesn't mean that Stephen Hawking secretly created the universe.

So to discover whether any particular unusual features are actually significant or not we need a well-defined statistical procedure for evaluating them. The statistical procedure used to find the Cold Spot involved filtering the CMB map with a special wavelet (a spherical Mexican hat wavelet, or SMHW), of a particular width (in this case $6^\circ$), and identifying the pixel direction with the coldest filtered temperature with the direction of the Cold Spot. Because of the nature of the wavelet used, this ensures that the Cold Spot is actually a reasonably sizable spot on the sky, as you can see in the image below:

The Cold Spot in the CMB sky. Image credit: WMAP/NASA.

Well, so we've found a cold spot. To elevate it to the status of "Cold Spot" in capitals and worry about how to explain it, we first need to quantify how unusual it is. Obviously it is unusual compared to other spots on our observed CMB, but this is true by construction and not very informative. Instead the usual procedure quite rightly compares the properties of the cold spots found in random Gaussian maps using exactly the same SMHW technique to the properties of the Cold Spot in our CMB. It is this procedure which results in the conclusion that our Cold Spot is statistically significant at roughly the "3-sigma level", i.e. only about 1 in every 1000 random maps has a coldest spot that is as "cold" as* our Cold Spot.** (The reason why I'm putting scare quotes around everything should become clear soon!)

So there appears to be a need to explain the existence of the Cold Spot using additional new physics of some kind. One such idea that that of the supervoid: a giant region hundreds of millions of light years across which is substantially emptier than the rest of the universe and lies between us and the Cold Spot. The emptiness of this region has a gravitational effect on the CMB photons that pass through it on their way to us, making them look colder (this is called the integrated Sachs-Wolfe or ISW effect) — hence the Cold Spot.

Now this is a nice idea in principle. In practice, unfortunately, it suffers from a problem: the ISW effect is very weak, so to produce an effect capable of "explaining" the Cold Spot the supervoid would need to be truly super — incredibly large and incredibly empty. And no such void has actually been seen in the distribution of galaxies (a previous claim to have seen it turned out to not be backed up by further analysis).

It was therefore quite exciting when in May a group of astronomers, led by Istvan Szapudi of the Institute for Astronomy in Hawaii, announced that they had found evidence for the existence of a large void in the right part of the sky. Even more excitingly, in a separate theoretical paper, Finelli et al. claimed to have modeled the effect of this void on the CMB and proven that it exactly fit the observations, and that therefore the question had been effectively settled: the Cold Spot was caused by a supervoid.

Except ... things aren't quite that simple. For a start, the void they claimed to have found doesn't actually have a large ISW effect — in terms of central temperature, less than one-seventh what would be needed to explain the Cold Spot. So Finelli et al. relied on a rather curious argument: that the second-order effect (in perturbation theory terms) of this void on CMB photons was somehow much larger than the first-order (i.e. ISW) effect. A puzzling inversion of our understanding of perturbation theory, then!

In fact there were a number of other reasons to be a bit suspicious of the claim, among which were that N-body simulations don't show this kind of unusual effect, and that several other larger and deeper voids have already been found that aren't aligned with Cold Spot-like CMB features. In our paper we provide a fuller list of these reasons to be skeptical before diving into the details of the calculation, where one might get lost in the fog of equations.

At the end of the day we were able to make several substantive points about the Cold Spot-as-a-supervoid hypothesis:
  1. Contrary to the claim by Finelli et al., the void that has been found is neither large enough nor deep enough to leave a large effect on the CMB, either through the ISW effect or its second-order counterpart — in simple terms, it is not a super enough supervoid.
  2. In order to explain the Cold Spot one needs to postulate a supervoid that is so large and so deep that the probability of its existence is essentially zero; if such a supervoid did exist it would be more difficult to explain that the Cold Spot currently is!
  3. The possible ISW effect of any kind of void that could reasonably exist in our universe is already sufficiently accounted for in the analysis using random maps that I described above.
  4. There's actually very little need to postulate a supervoid to explain the central temperature of the Cold Spot — the fact that we chose the coldest spot in our CMB maps already does that!
Point number 1 requires a fair bit of effort and a lot of equations to prove (and coincidentally it was also shown in an independent paper by Jim Zibin that appeared just a day before ours), but in the grand scheme of things it is probably not a supremely interesting one. It's nice to know that our perturbation theory intuition is correct after all, of course, but mistakes happen to the best of us, so the fact that one paper on the arXiv contains a mistake somewhere is not tremendously important.

On the other hand, point 2 is actually a fairly broad and important one. It is a result that cosmologists with a good intuition would perhaps have guessed already, but that we are able to quantify in a useful way: to be able to produce even half the temperature effect actually seen in the Cold Spot would require a hypothetical supervoid almost twice as large and twice as empty as the one seen by Szapudi's team, and the odds of such a void existing in our universe would be something like a one-in-a-million or one-in-a-billion (whereas the Cold Spot itself is at most a one-in-a-thousand anomaly in random CMB maps). A supervoid therefore cannot help to explain the Cold Spot.***

Point 3 is again something that many people probably already knew, but equally many seem to have forgotten or ignored, and something that has not (to my knowledge) been stated explicitly in any paper. My particular favourite though is point 4, which I could — with just a tiny bit of poetic licence — reword as the statement that
"the Cold Spot is not unusually cold; if anything, what's odd about it is only that it is surrounded by a hot ring"
I won't try to explain the second part of that statement here, but the details are in our paper (in particular Figure 7, in case you are interested). Instead what I will do is to justify the first part by reproducing Figure 6 of our paper here:

The averaged temperature anisotropy profile at angle $\theta$ from the centre of the Cold Spot (in red),  and the corresponding 1 and $2\sigma$ contours from the coldest spots in 10,000 random CMB maps (blue). Figure from arXiv:1408.4720.

What the blue shaded regions show is the confidence limits on the expected temperature anisotropy $\Delta T$ at angles $\theta$ from the direction of the coldest spots found in random CMB maps using exactly the SMHW selection procedure. The red line, which is the measured temperature for our actual Cold Spot, never goes outside the $2\sigma$ equivalent confidence region. In particular, at the centre of the Cold Spot the red line is pretty much exactly where we would expect it to be. The Cold Spot is not actually unusually cold.

Just before ending, I thought I'd also mention that Syksy has written about this subject on his own blog (in Finnish only): as I understand it, one of the points he makes is that this form of peer review on the arXiv is actually more efficient than the traditional one that takes place in journals.

Update: You might also want to have a look at Shaun's take on the same topic, which covers the things I left out here ...

* People often compare other properties of the Cold Spot to those in random maps, for instance its kurtosis or other higher-order moments, but for our purposes here the total filtered temperature will suffice.

** Although as Zhang and Huterer pointed out a few years ago, this analysis doesn't account for the particular choice of the SMHW filter or the particular choice of $6^\circ$ width — in other words, that it doesn't account for what particle physicists call the "look-elsewhere effect". Which means it is actually much less impressive.

*** If we'd actually seen a supervoid which had the required properties, we'd have a proximate cause for the Cold Spot, but also a new and even bigger anomaly that required an explanation. But as we haven't, the point is moot.

Chad OrzelNordita Workshop for Science Writers, aka “Quantum Boot Camp”

Since this part of the trip is actually work-like, I might as well dust off the blog and post some actual physics content. Not coincidentally, this also provides a way to put off fretting about my talk tomorrow…

I’m at the Nordita Workshop for Science Writers on quantum theory, which a couple of the attending writers have referred to online as boot camp, though in an affectionate way. The idea is to provide a short crash course on cool quantum physics, so as to give writers a bit more background in subjects they might need to cover.

The first talk was from Rainer Kaltenbaek (whose name I was persistently misspelling on Twitter…), from Markus Apselmeyer’s group in Vienna. He gave an introduction to the ideas of entanglement and Bell’s theorem, then talked about efforts underway in Austria and elsewhere to try to do quantum interference of macroscopic objects. Their idea is to use nanoscale glass beads held in optical traps and cooled to extremely low temperatures, then released in a way that should produce a wave-like interference pattern. Unfortunately, to see the effect they want, they’d need to allow the waves to expand for something like 100 seconds, which would correspond to a free-fall drop of a bit less than 50km. Kaltenbaek dryly noted that “This isn’t something you want to try on Earth,” so they’re proposing to do it in space. You can find a couple of versions of their proposal on the arxiv.

I love this sort of stuff– it’s close enough to my own area that I can follow the details with relative ease, and the ambition is really impressive. They want to do this experiment with nanospheres on a satellite at one of the Lagrange points, so keeping the weight down is important. So rather than launching heavy vacuum pumps and cryogenic coolers, they’re just planning to mount the whole thing on the outside of the spacecraft. Which is an idea firmly in the “so crazy it just might work” category.

After lunch, we had a talk from Silke Weinfurtner on ways to simulate gravitational effects in systems that are a little easier to control, like flowing water. It turns out that a number of situations in hydrodynamics are described by equations that are nearly identical to those that show up in interesting problems in general relativity, like black and white holes. She also talked about how to simulate an expanding universe in a Bose-Einstein Condensate by changing the speed of sound, which is a very cool idea. This paper by Chris Westbrook’s group in Paris is related, and a terrific experiment to boot.

One of the coolest bits for me was a discussion of how to understand Hawking radiation as a scattering process, which relies on a model very much like one we used in one of my Ph.D. thesis papers (here’s a “making of” post as well). There’s a key mathematical difference in the particular function they’re using that means they get amplification of incoming fluctuations, which we don’t see in atoms, but the basic concept was very similar, and that was kind of mind-blowing.

The last talks of the day were from Marie Ericsson giving a survey of quantum computing. This one got a little hung up on the definitions of complexity classes (oh, what we wouldn’t have given for a computer scientist…), and then how to understand a Mach-Zehnder interferometer. This last ended up with me drawing pictures on the whiteboard, which I hope was sort of helpful. There was also some discussion of how Shor’s algorithm works, including multiple references to Scott Aaronson’s excellent explanation, which you should’ve read by now. I mean, it was posted in 2007…

Today’s program is lighter on talks– Ray Laflamme is talking about quantum information, then there’s a lab tour and general discussion. I’m talking on Friday. And with that, I probably ought to go review my slides…

Terence TaoLarge gaps between consecutive prime numbers

Kevin Ford, Ben Green, Sergei Konyagin, and myself have just posted to the arXiv our preprint “Large gaps between consecutive prime numbers“. This paper concerns the “opposite” problem to that considered by the recently concluded Polymath8 project, which was concerned with very small values of the prime gap {p_{n+1}-p_n}. Here, we wish to consider the largest prime gap {G(X) = p_{n+1}-p_n} that one can find in the interval {[X] = \{1,\dots,X\}} as {X} goes to infinity.

Finding lower bounds on {G(X)} is more or less equivalent to locating long strings of consecutive composite numbers that are not too large compared to the length of the string. A classic (and quite well known) construction here starts with the observation that for any natural number {n}, the consecutive numbers {n!+2, n!+3,\dots,n!+n} are all composite, because each {n!+i}, {i=2,\dots,n} is divisible by some prime {p \leq n}, while being strictly larger than that prime {p}. From this and Stirling’s formula, it is not difficult to obtain the bound

\displaystyle G(X) \gg \frac{\log X}{\log\log X}. \ \ \ \ \ (1)

 

A more efficient bound comes from the prime number theorem: there are only {(1+o(1)) \frac{X}{\log X}} primes up to {X}, so just from the pigeonhole principle one can locate a string of consecutive composite numbers up to {X} of length at least {(1-o(1)) \log X}, thus

\displaystyle G(X) \gtrsim \log X \ \ \ \ \ (2)

 

where we use {X \gtrsim Y} or {Y \lesssim X} as shorthand for {X \geq (1-o(1)) Y} or {Y \leq (1+o(1)) X}.

What about upper bounds? The Cramér random model predicts that the primes up to {X} are distributed like a random subset {\{1,\dots,X\}} of density {1/\log X}. Using this model, Cramér arrived at the conjecture

\displaystyle G(X) \ll \log^2 X.

In fact, if one makes the extremely optimistic assumption that the random model perfectly describes the behaviour of the primes, one would arrive at the even more precise prediction

\displaystyle G(X) \sim \log^2 X.

However, it is no longer widely believed that this optimistic version of the conjecture is true, due to some additional irregularities in the primes coming from the basic fact that large primes cannot be divisible by very small primes. Using the Maier matrix method to capture some of this irregularity, Granville was led to the conjecture that

\displaystyle G(X) \gtrsim 2e^{-\gamma} \log^2 X

(note that {2e^{-\gamma} = 1.1229\dots} is slightly larger than {1}). For comparison, the known upper bounds on {G(X)} are quite weak; unconditionally one has {G(X) \ll X^{0.525}} by the work of Baker, Harman, and Pintz, and even on the Riemann hypothesis one only gets down to {G(X) \ll X^{1/2} \log X}, as shown by Cramér (a slight improvement is also possible if one additionally assumes the pair correlation conjecture; see this article of Heath-Brown and the references therein).

This conjecture remains out of reach of current methods. In 1931, Westzynthius managed to improve the bound (2) slightly to

\displaystyle G(X) \gg \frac{\log\log\log X}{\log\log\log\log X} \log X ,

which Erdös in 1935 improved to

\displaystyle G(X) \gg \frac{\log\log X}{(\log\log\log X)^2} \log X

and Rankin in 1938 improved slightly further to

\displaystyle G(X) \gtrsim c \frac{\log\log X (\log\log\log\log X)}{(\log\log\log X)^2} \log X \ \ \ \ \ (3)

 

with {c=1/3}. Remarkably, this rather strange bound then proved extremely difficult to advance further on; until recently, the only improvements were to the constant {c}, which was raised to {c=\frac{1}{2} e^\gamma} in 1963 by Schönhage, to {c= e^\gamma} in 1963 by Rankin, to {c = 1.31256 e^\gamma} by Maier and Pomerance, and finally to {c = 2e^\gamma} in 1997 by Pintz.

Erdös listed the problem of making {c} arbitrarily large one of his favourite open problems, even offering (“somewhat rashly”, in his words) a cash prize for the solution. Our main result answers this question in the affirmative:

Theorem 1 The bound (3) holds for arbitrarily large {c>0}.

In principle, we thus have a bound of the form

\displaystyle G(X) \geq f(X) \frac{\log\log X (\log\log\log\log X)}{(\log\log\log X)^2} \log X

for some {f(X)} that grows to infinity. Unfortunately, due to various sources of ineffectivity in our methods, we cannot provide any explicit rate of growth on {f(X)} at all.

We decided to announce this result the old-fashioned way, as part of a research lecture; more precisely, Ben Green announced the result in his ICM lecture this Tuesday. (The ICM staff have very efficiently put up video of his talks (and most of the other plenary and prize talks) online; Ben’s talk is here, with the announcement beginning at about 0:48. Note a slight typo in his slides, in that the exponent of {\log\log\log X} in the denominator is {3} instead of {2}.) Ben’s lecture slides may be found here.

By coincidence, an independent proof of this theorem has also been obtained very recently by James Maynard.

I discuss our proof method below the fold.

— 1. Sketch of proof —

Our method is a modification of Rankin’s method, combined with some work of myself and Ben Green (and of Tamar Ziegler) on counting various linear patterns in the primes. To explain this, let us first go back to Rankin’s argument, presented in a fashion that allows for comparison with our own methods. Let’s first go back to the easy bound (1) that came from using the consecutive string {n!+2,\dots,n!+n} of composite numbers. This bound was inferior to the prime number theorem bound (2), however this can be easily remedied by replacing {n!} with the somewhat smaller primorial {P(n)}, defined as the product of the primes up to and including {n}. It is still easy to see that {P(n)+2,\dots,P(n)+n} are all composite, with each {P(n)+i, i \leq n} divisible by some prime {p \leq n} while being larger than that prime. On the other hand, the prime number theorem tells us that {P(n) = \exp( (1+o(1)) n)}. From this, one can recover an alternate proof of (2) (perhaps not so surprising, since the prime number theorem is a key ingredient in both proofs).

This gives hope that further modification of this construction can be used to go beyond (2). If one looks carefully at the above proof, we see that the key fact used here is that the discrete interval of integers {\{2,\dots,n\}} is completely sieved out by the residue classes {0 \hbox{ mod } p} for primes {p \leq n}, in the sense that each element in this interval is contained in at least one of these residue classes. More generally (and shifting the interval by {1} for a more convenient normalisation), suppose we can find an interval {[y] := \{ n \in {\bf N}: n \leq y \}} which is completely sieved out by one residue class {a_p \hbox{ mod } p} for each {p \leq x}, for some {x} and {y}. Then the string of consecutive numbers {m+1,\dots,m+\lfloor y\rfloor} will be composite, whenever {m} is an integer larger than or equal to {x} with {m = - a_p \hbox{ mod } p} for each prime {p \leq x}, since each of the {m+i, i \leq y} will be divisible by some prime {p \leq x} while being larger than that prime. From the Chinese remainder theorem, one can find such an {m} that is of size at most {x+P(x)}. From this and the prime number theorem, one can obtain lower bounds on {G(X)} if one can get lower bounds on {y} in terms of {x}. In particular, if for any large {x} one can completely sieve out {[y]} with a residue class {a_p \hbox{ mod } p} for each {p \leq x}, and

\displaystyle y \sim c \frac{\log x (\log\log\log x)}{(\log\log x)^2} x, \ \ \ \ \ (4)

 

then one can establish the bound (3). (The largest {y} one can take for a given {x} is known as the Jacobsthal function of the primorial {P(x)}.) So the task is basically to find a smarter set of congruence classes {a_p \hbox{ mod } p} than just the zero congruence classes {0 \hbox{ mod } p} that can sieve out a larger interval than {x}. (Unfortunately, this approach by itself is unlikely to reach the Cramér conjecture; it was shown by Iwaniec using the large sieve that {y} is necessarily of size {O(x^2)} (which somewhat coincidentally matches the Cramér bound), but Maier and Pomerance conjecture that in fact one must have {y \leq x \log^{2+o(1)} x}, which would mean that the limit of this method would be to establish a bound of the form {G(X) \geq (\log\log X)^{2+o(1)} \log X}.)

So, how can one do better than just using the “Eratosthenes” sieve {0 \hbox{ mod } p}? We will divide the sieving into different stages, depending on the size of {p}. It turns out that a reasonably optimal division of primes {p} up to {x} will be into the following four classes:

  • Stage 1 primes: primes {p} that are either tiny (less than {\log x}) or medium size (between {z} and {x/4}), where {z} is a parameter to be chosen later.
  • Stage 2 primes: primes that are small (between {\log x} and {z}).
  • Stage 3 primes: Primes that are very large (between {x/2} and {x}).
  • Stage 4 primes: Primes that are fairly large (between {x/4} and {x/2}).

We will take an interval {[y]}, where {y} is given by (4), and sieve out first by Stage 1 primes, then Stage 2 primes, then Stage 3 primes, then Stage 4 primes, until none of the elements of {[y]} are left.

Let’s first discuss the final sieving step, which is rather trivial. Suppose that our sieving by the first three sieving stages is so efficient that the number of surviving elements of {[y]} is less than or equal to the number of Stage 4 primes (by the prime number theorem, this will for instance be the case for sufficiently large {x} if there are fewer than {(\frac{1}{5}+o(1)) \frac{x}{\log x}} survivors). Then one can finish off the remaining survivors simply by using each of the Stage 4 primes {p} to remove one of the surviving integers in {[y]} by an appropriate choice of residue class {a_p \hbox{ mod } p}. So we can recast our problem as an approximate sieving problem rather than a perfect sieving problem; we now only need to eliminate most of the elements of {[y]} rather than all of them, at the cost of only using primes from the Stages 1-3, rather than 1-4. Note though that for {y} given by (4), the Stage 1-3 sieving has to be reasonably efficient, in that the proportion of survivors cannot be too much larger than {1/\log^2 x} (ignoring factors of {\log\log x} etc.).

Next, we discuss the Stage 1 sieving process. Here, we will simply copy the classic construction and use the Eratosthenes sieve {0 \hbox{ mod } p} for these primes. The elements of {[y]} that survive this process are those elements that are not divisible by any Stage 1 primes, that is to say they are only divisible by small (Stage 2) primes, or else contain at least one prime factor larger than {x/4}, and no prime factors less than {\log x}. In the latter case, the survivor has no choice but to be a prime in {(x/4,y]} (since from (4) we have {y \leq x \log x} for {x} large enough). In the former case, the survivor is a {z}-smooth number – a number with no prime factors less than or equal to {z}. How many such survivors are there? Here we can use a somewhat crude upper bound of Rankin:

Lemma 2 Let {1 < z \leq y} be large quantities, and write {u := \frac{\log y}{\log x}}. Suppose that

\displaystyle \log u = o( \log z ).

Then the number of {z}-smooth numbers in {[y]} is at most {e^{-u\log u + O(u)} y \log z}.

Proof: We use a Dirichlet series method commonly known as “Rankin’s trick”. Let {0<\rho<1} be a quantity to be optimised in later, and abbreviate “{z}-smooth” as “smooth”. Observe that if {n} is a smooth number less than {y}, then

\displaystyle 1 \leq \frac{y}{y^\rho} \frac{1}{n^{1-\rho}}. \ \ \ \ \ (5)

 

Thus, the number of smooth numbers in {y} is at most

\displaystyle \frac{y}{y^\rho} \sum_{n \hbox{smooth}} \frac{1}{n^{1-\rho}}

where we have simply discarded the constraint {n \leq y}. The point of doing this is that the above expression factors into a tractable Euler product

\displaystyle \frac{y}{y^\rho} \prod_{p \leq z} (1 + \frac{1}{p^{1-\rho}} + \frac{1}{p^{2(1-\rho)}} + \dots).

We will choose

\displaystyle \rho := \frac{u \log u}{\log y} = \frac{\log u}{\log z} \ \ \ \ \ (6)

 

so that {y^\rho = e^{u \log u}} and {\rho=o(1)}. Then the above expression simplifies to

\displaystyle \ll \frac{y}{e^{u \log u}} \exp( \sum_{p \leq z} \frac{1}{p^{1-\rho}} ).

To compute the sum here, we first observe from Mertens’ theorem (discussed in this previous blog post) that

\displaystyle \sum_{p \leq z} \frac{1}{p} = \log \log z + O(1),

so we may bound the previous expression by

\displaystyle \ll \frac{y \log z}{e^{u \log u}} \exp( \sum_{p \leq z} \frac{1}{p^{1-\rho}} - \frac{1}{p} )

which we rewrite using (6) as

\displaystyle \ll \frac{y \log z}{e^{u \log u}} \exp( \sum_{p \leq z} \frac{\exp( \frac{\log p}{\log z} \log u ) - 1}{p} ).

Next, we use the convexity inequality

\displaystyle \exp( ct ) - 1 \leq (\exp(c) - 1 ) t

for {c > 0} and {0 \leq t \leq 1}, applied with {c := \log u} and {t := \frac{\log p}{\log z}}, to conclude that

\displaystyle \exp( \frac{\log p}{\log z} \log u ) - 1 \leq u \frac{\log p}{\log z}

Finally, from the prime number theorem we have {\sum_{p \leq z} \frac{\log p}{\log z} \ll 1}. The bound follows. \Box

Remark 1 One can basically eliminate the {\log z} factor here (at the cost of worsening the {O(u)} error slightly to {O(u\log\log(3u))}) by a more refined version of the Rankin trick, based on replacing the crude bound (5) by the more sophisticated inequality

\displaystyle 1 = \frac{\log \frac{y}{n}}{\log n} + \sum_{p^\nu || n; p \leq z} \frac{\nu \log p}{\log y},

\displaystyle \ll \frac{y}{y^\rho \log y} \frac{1}{n^{1-\rho}} + \sum_{p^\nu || n; p \leq z} \frac{\nu \log p}{\log y},

where {p^\nu||n} denotes the assertion that {p} divides {n} exactly {\nu} times. (Thanks to Kevin Ford for pointing out this observation to me.) In fact, the number of {z}-smooth numbers in {[y]} is known to be asymptotically {e^{-u\log u + O( u \log\log(3u) )} y} in the range {\log^3 y \leq z \leq y}, a result of de Bruijn.

In view of the error term permitted by the Stage 4 process, we would like to take {z} as large as possible while still leaving only {o(x/\log x)} smooth numbers in {[y]}. A somewhat efficient choice of {z} here is

\displaystyle z := \exp( \frac{\log\log\log x}{4 \log\log x} \log x ),

so that {u \sim 4 \frac{\log\log x}{\log\log\log x}} and {u\log u \sim 4 \log\log x}, and then one can check that the above lemma does indeed show that there are {o(x/\log x)} smooth numbers in {[y]}. (If we use the sharper bound in the remark, we can reduce the {4} here to a {3}, although this makes little difference to the final bound.) If we let {{\mathcal Q}} denote all the primes in {(x/4,y]}, the remaining task is then to sieve out all but {(\frac{1}{5}+o(1)) \frac{x}{\log x}} of the primes in {{\mathcal Q}} by using one congruence class from each of the Stage 2 and Stage 3 primes.

Note that {{\mathcal Q}} is still quite large compared to the error that the Stage 4 primes can handle – it is of size about {y/\log x}, whereas we need to get down to a bit less than {x/\log x}. Still, this is some progress (the remaining sparsification needed is of the order of {1/\log^{1+o(1)} x} rather than {1/\log^{2+o(1)} x}).

For the Stage 2 sieve, we will just use a random construction, choosing {a_p \hbox{ mod } p} uniformly at random for each Stage 2 prime. This sieve is expected to sparsify the set {{\mathcal Q}} of survivors by a factor

\displaystyle \gamma := \prod_{\hbox{Stage 2 primes} p} (1-\frac{1}{p}),

which by Mertens’ theorem is of size

\displaystyle \gamma \sim \frac{\log \log x}{\log z} \sim 4 \frac{(\log \log x)^2}{\log x \log\log\log x}.

In particular, if {y} is given by (4), then all the strange logarithmic factors cancel out and

\displaystyle \gamma y \sim 4c x.

In particular, we expect {{\mathcal Q}} to be cut down to a random set (which we have called {{\mathcal Q}({\bf a})} in our paper) of size about {4c \frac{x}{\log x}}. This would already finish the job for very small {c} (e.g. {c \leq 1/20}), and indeed Rankin’s original argument proceeds more or less along these lines. But now we want to take {c} to be large.

Fortunately, we still have the Stage 3 primes to play with. But the number of Stage 3 primes is about {\frac{1}{2} \frac{x}{\log x}}, which is a bit smaller than the number of surviving primes {{\mathcal Q}({\bf a})}, which is about {4c \frac{x}{\log x}}. So to make this work, most of the Stage 3 congruence classes {a_p \hbox{ mod } p} need to sieve out many primes from {{\mathcal Q}({\bf a})}, rather than just one or two. (Rankin’s original argument is based on sieving out one prime per congruence class; the subsequent work of Maier-Pomerance and Pintz is basically based on sieving out two primes per congruence class.)

Here, one has to take some care because the set {{\mathcal Q}({\bf a})} is already quite sparse inside {[y]} (its density is about {1/\log^{2+o(1)} x}). So a randomly chosen {a_p \hbox{ mod } p} would in fact most likely catch none of the primes in {{\mathcal Q}({\bf a})} at all. So we need to restrict attention to congruence classes {a_p \hbox{ mod } p} which already catch a large number of primes in {{\mathcal Q}}, so that even after the Stage 2 sieving one can hope to be left with many congruence classes that also catch a large number of primes in {{\mathcal Q}({\bf a})}.

Here’s where my work with Ben came in. Suppose one has an arithmetic progression {a, a+d, \dots, a+(r-1)d} of length {r} consisting entirely of primes in {{\mathcal Q}}, and with {d} a multiple of {p}, then the congruence class {a \hbox{ mod } p} is guaranteed to pick up at least {k} primes in {{\mathcal Q}}. My first theorem with Ben shows that no matter how large {k} is, the set {{\mathcal Q}} does indeed contain some arithmetic progressions {a,a+d,\dots,a+(r-1)d} of length {r}. This result is not quite suitable for our applications here, because (a) we need the spacing {d} to also be divisible by a Stage 3 prime {p} (in our paper, we take {d = r! p} for concreteness, although other choices are certainly possible), and (b) for technical reasons, it is insufficient to simply have a large number of arithmetic progressions of primes strewn around {{\mathcal Q}}; they have to be “evenly distributed” in some sense in order to be able to still cover most of {{\mathcal Q}({\bf a})} after throwing out any progression that is partly or completely sieved out by the Stage 2 primes. Fortunately, though, these distributional results for linear equations in primes were established by a subsequent paper of Ben and myself, contingent on two conjectures (the Mobius-Nilsequences conjecture and the inverse conjecture for the Gowers norms) which we also proved (the latter with Tamar Ziegler) in some further papers. (Actually, strictly speaking our work does not quite cover the case needed here, because the progressions are a little “narrow”; we need progressions of primes in {[y]} whose spacing {d} is comparable to {x} instead of {y}, whereas our paper only considered the situation in which the spacing was comparable to the elements of the progression. It turns out though that the arguments can be modified (somewhat tediously) to extend to this case though.)


Filed under: math.NT, paper Tagged: arithmetic progressions, prime gaps

Steinn SigurðssonIce Bucket Challenge Done the Civilized Way

In which Chris “Slick” Ford challenges me, and I accept.

In turn, I challenge Valerie, Diddi and Stefan.
You know who you are.

h/t to Sir Patrick Stewart for illustrating how to properly take on such challenges.

August 27, 2014

Mark Chu-CarrollGodel part 2: Arithmetic and Logic

In the last post, we saw how to take statements written in the logic of the Principia Mathematica, and convert them into numerical form using Gödel numbering. For the next step in Gödel’s proof, we need to go meta-mathematical.

Ultimately, we want to write first-order statements that can reason about first order statements. But the entire structure of the principia and its logic is designed to make
that impossible. First order statements can only reason about numbers and their properties.

But now, we’ve got the ability to represent statements – first order, second order, third order, any order. What we still need is a way of describing the properties of those numerical statements in terms of operations that can be expressed using nothing but first order statements.

The basic trick to incompleteness is that we’re going to use the numerical encoding of statements to say that a predicate or relation is represented by a number. Then we’re going to write predicates about predicates by defining predicates on the numerical representations of the first-order predicates. That’s going to let us create a true statement in the logic that can’t be proven with the logic.

To do that, we need to figure out how to take our statements and relations represented as numbers, and express properties of those statements and relations in terms of arithmetic. To do that, we need to define just what it means to express something arithmetically. Gödel did that by defining “arithmetically” in terms of a concept called primitive recursion.

I learned about primitive recursion when I studied computational complexity. Nowadays, it’s seen as part of theoretical computer science. The idea, as we express it in modern terms, is that there are many different classes of computable functions. Primitive recursion is one of the basic complexity classes. You don’t need a Turing machine to compute primitive recursive functions – they’re a simpler class.

The easiest way to understand primitive recursion is that it’s what you get in a programming language with integer arithmetic, and simple for-loops. The only way you can iterate is by repeating things a bounded number of times. Primitive recursion has a lot of interesting properties: the two key ones for our purposes here are: number theoretic proofs are primitive recursive, and every computation of a primitive recursive function is guaranteed to complete within a bounded amount of time.

The formal definition of primitive recursion, the way that Gödel wrote it, is quite a bit more complex than that. But it means the same thing.

We start with what it means to define a formula via primitive recursion. (Note the language that I used there: I’m not explaining what it means for a function to be primitive recursive; I’m explaining what it means to be defined via primitive recursion.) And I’m defining formulae, not functions. In Gödel’s proof, we’re always focused on numerical reasoning, so we’re not going to talk about programs or algorithms, we’re going to about the definition of formulae.

A formula phi(x_1, x_2, ..., x_n) is defined via primitive recursion if, for some other formulae \psi and \mu:

  • Base: \phi(0, x_2, ..., x_n) = \psi(x_2, ..., x_n)
  • Recursive: \phi(i+1, x_2, ..., x_n) = \mu(i, \phi(i, x_2, ..., x_n), x_2, ..., x_n).

So, basically, the first parameter is a bound on the number of times that phi can invoked recursively. When it’s 0, you can’t invoke \phi any more.

A formula is primitive recursive if it defined from a collection of formulae \phi_1, ..., \phi_n where any formula \phi_i is defined via primitive recursion from \phi_1, ..., \phi_{i-1}, or the primitive succ function from Peano arithmetic.

For any formula phi_i in that sequence, the degree of the formula is the number of other primitive recursive formulae used in its definition.

Now, we can define a primitive recursive property: R(x_1, ..., x_n) is primitive recursive if and only if there exists a primitive recursive function \phi such that \phi(x_1, ..., x_n) = 0.

With primitive recursive formulae and relations defined, there’s a bunch of theorems about how you can compose primitive recursive formulae and relations:

  1. Every function or relation that you get by substituting a primitive recursive function for a variable in a primitive recursive function/relation is primitive recursive.
  2. If R and S are primitive relations, then ¬R, R∧S, R∨S are all primitive recursive.
  3. If \phi(x_1, ..., x_n) and \psi(x_1, ..., x_n) are primitive recursive functions, then the relation R(x_1, ..., x_n) \Leftrightarrow (\phi(x_1, ..., x_n) = \psi(x_1, ..., x_n) is also primitive recursive.
  4. Let xv and zv be finite-length tuples of variables. If the function \phi(xv) and the relation R(y, zv) are primitive recursive, then so are the relations:
    • S(xv, zv) \Leftrightarrow  (\exists y \le \phi(xv). R(y, zv))
    • T(xv, zv) \Leftrightarrow  (\forall y \le A(xv). R(y, zv))
  5. Let xv and zv be finite-length tuples of variables. And let text{argmin}[y le f(x).R(x)] be the smallest value of x for which y le f(x) and R(x) is true, or 0 if there is no such value. Then if the function phi(xv) and the relation R(y, zv) are primitive recursive, then so is the function P(xv, zv) = (\text{argmin}[y \le A(xv). R(y, zv))].

By these definitions, addition, subtraction, multiplication, and integer division are all primitive recursive.

Ok. So, now we’ve got all of that out of the way. It’s painful, but it’s important. What we’ve done is come up with a good formal description of what it means for something to be an arithmetic property: if we can write it as a primitive recursive relation or formula, it’s arithmetic.

Tim GowersICM2014 — Jim Arthur plenary lecture

The main other thing I did on day two of the congress was go to a reception in the evening hosted by the French Embassy. It was less formal than that makes it sound, and as I circulated I met a number of people I hadn’t seen for quite a while, as well as others I had got to know at the congress itself. The French ambassador, who was disconcertingly young looking, gave a speech, as did Artur Avila (as you know, Avila, like Ngo four years ago, is French), and one other person, whose name I’ve annoyingly forgotten. One interesting nugget of information to come out of those speeches was that Paris is planning to bid for the 2022 ICM. If that bid is successful, then Avila will have two successive ICMs in his home country. There was plenty of food at the reception, so, as I had hoped, I didn’t need to think about finding supper. When we arrived, we were asked for our business cards. In common with approximately 99.9% of mathematicians, I don’t have a business card, but for cardless people it was sufficient to write our names on little bits of paper. This, it turned out, was to be entered in a draw for a bottle of wine. When the time came, it was Avila who drew out the pieces of paper. Apparently, this is a Korean custom. There were in fact two bottles going, so two chances to win, which sort of became three chances when the person who should have won the first bottle wasn’t there to claim it. And so, for the first time in my life … not really. I have never won anything in a raffle and that puzzling sequence continued.

The next morning kicked off (after breakfast at the place on the corner opposite my hotel, which served decent espressos) with Jim Arthur, who gave a talk about the Langlands programme and his role in it. He told us at the beginning that he was under strict instructions to make his talk comprehensible — which is what you are supposed to do as a plenary lecturer, but this time it was taken more seriously, which resulted in a higher than average standard. Ingrid Daubechies deserves a lot of credit for that. He explained that in response to that instruction, he was going to spend about two thirds of his lecture giving a gentle introduction to the Langlands programme and about one third talking about his own work. In the event he messed up the timing and left only about five minutes for his contribution, but for everybody except him that was just fine: we all knew he was there because he had done wonderful work, and most of us stood to learn a lot more from hearing about the background than from hearing about the work itself.

I’ve made a few attempts to understand the Langlands programme — not by actually studying it, you understand, but by attending general-audience talks or reading general-audience articles. It’s a bit of a two-steps-forward (during the talk) and one-step-back (during the weeks and months after the talk) process, but this was a very good lecture and I really felt I learned things from it. Some of them I immediately forgot, but have in my notes, and perhaps I’ll fix them slightly better in my brain by writing about them here.

For example, if you had asked me what the central problem in algebraic number theory is, I would never have thought of saying this. Given a fixed polynomial P and a prime p, we can factorize P into irreducibles over the field \mathbb{F}_p. It turns out to be inconvenient if any of these irreducible factors occurs with multiplicity greater than 1, so an initial assumption is that P has distinct roots over \mathbb{C} (or at least I think that’s the assumption). [Insert: looking at my notes, I realize that a better thing to write is \mathbb{E}, the splitting field of P, rather than \mathbb{C}, though I presume that that gives the same answer.] But even then, it may be that over some primes p there are repeated irreducible factors. The word “ramified”, which I had always been slightly scared of, comes in here. I can’t remember what ramifies over what, or which way round is ramified and which unramified, so let me quickly look that up. Hmm, that was harder than I expected, because the proper definition is to do with rings, extension fields and the like. But it appears that “ramified” refers to the case where you have multiplicity greater than 1 somewhere. For the purposes of this post, let’s say that a prime p is ramified (I’ll take the polynomial P as given) if P has an irreducible factor over \mathbb{F}_p with multiplicity greater than 1. The main point to remember is that the set S of ramified primes is small. I think Arthur said that it was always finite.

So what is the fundamental problem of algebraic number theory? Well, when you decompose a polynomial into irreducible factors, those factors have degrees. If the degree of P is d, then the degrees of the irreducible factors form a partition of d: that is, a collection of positive integers that add up to d. The question is this: which (unramified) primes give rise to which partitions of d?

How on earth is that the fundamental problem of algebraic number theory? What’s interesting about it? Aren’t number theorists supposed (after a long and circuitous route) to be solving Diophantine equations and things like that?

Arthur gave us a pretty convincing partial answer to these questions by discussing the example P(x)=x^2+1. The splitting field is \mathbb{Q}(\sqrt{-1}) — that is, rational linear combinations of 1 and i — and the only ramified prime is 2. (The reason 2 is ramified is that over \mathbb{F}_2 we have x^2+1=(x+1)^2.)

Since the degree of x^2+1 is 2, the two partitions of the degree are (2) and (1,1). The first occurs if and only if x^2+1 cannot be factorized over \mathbb{F}_p, which is the same as saying that -1 is not a quadratic residue. So in this case, the question becomes, “For which odd primes p is -1 a quadratic residue?” to which the answer is, famously, all primes congruent to 1 mod 4. So Arthur’s big grown-up question is a generalization of a familiar classical result of number theory.

To answer the question for quadratic polynomials, Gauss’s law of quadratic reciprocity is a massive help. I think it is correct to say that the Langlands programme is all about trying to find vast generalizations of quadratic reciprocity that will address the far more general question about the degrees of irreducible factors of arbitrary polynomials. But perhaps it is more general still — at the time of writing I’m not quite sure.

Actually, I think I am sure. One thing Arthur described was Artin L-functions, which are a way of packaging up the data I’ve just described. Here is the definition he gave. You start with a representation r:\mathrm{Gal}(\mathbb{E}/\mathbb{Q})\to\mathrm{GL}(N,\mathbb{C}) of the Galois group of P. For simplicity he assumed that the Galois group was actually S_n (where n is the degree of P). Then for each unramified prime p the partition of n you get can be thought of as the cycle type of a permutation and thus as a conjugacy class in S_n. The image of this conjugacy class under r is a conjugacy class in \mathrm{GL}(N,\mathbb{C}), and since conjugate linear maps have the same determinant, this conjugacy class has a well-defined determinant, which we denote by c_p(r). The Artin L-function is then defined to be

\displaystyle s\mapsto\prod_{p\notin S}\mathrm{det}(1-c_p(r)p^{-s})^{-1}.

If you expand this out, you get a Dirichlet series, of which this is the Euler product. And Dirichlet series that have Euler products are basically L-functions. Just as the Riemann zeta function packages up lots of important information about the primes, so the Artin L-functions package up lots of important information about the fundamental problem of algebraic number theory discussed earlier.

One interesting thing that Arthur told us was that in order to do research in this area, you have to use results from many different areas. This makes it difficult to get started, so most young researchers start by scouring the textbooks for the key theorems and using them as black boxes, understanding them fully only much later.

For example, certain Riemannian manifolds are particularly important, because automorphic forms come from solutions to differential equations (based on the Laplacian) on those manifolds. Arthur didn’t tell us exactly what these “special Riemannian manifolds” were, but he did say that they corresponded to reductive algebraic groups. (An algebraic group is roughly speaking a group defined using polynomials. For example, \mathrm{SO}(n) is algebraic, because the condition of having determinant 1 is expressible as a polynomial in the entries of a matrix, and the group operation, matrix multiplication, is also a polynomial operation. What “reductive” means I don’t know.) He then said that many beginners memorize ten key theorems about reductive algebraic groups and don’t bother themselves with the proofs.

Where does Langlands come into all this? He defined some L-functions that have a formula very similar to the formula for Artin L-functions: in fact, all you have to do is replace the r in that formula with a \pi. So a lot depends on what \pi is. Apparently it’s an automorphic representation. I’m not sure what those are.

A big conjecture is that every arithmetic L-function is an automorphic L-function. This would give us a non-Abelian class field theory. (Classical class field theory studies Abelian field extensions, and can tell you things like which numbers are cubic residues mod p.)

This conjecture is a special case of Langlands’s famous principle of functoriality, which Artin described as the fundamental problem. (OK, I’ve already described something else as the fundamental problem, but this is somehow the real fundamental problem.) I can’t resist stating the problem, because it looks as though it ought to be easy. I can imagine getting hooked on it in a parallel life, because it screams out, “Think about me in the right way and I’ll drop out.” Of course, that’s a very superficial impression, and probably once one actually does think about it, one quickly loses any feeling that it should at some sufficiently deep level be easy.

The principle says this.

Conjecture. Given two groups G and G', an automorphic representation \pi' of G' and an analytic homomorphism between their dual groups

\displaystyle \rho:\hat{G'}\to\hat G

there is an automorphic representation \pi of G such that c(\pi)=\rho(c(\pi')); that is,

c_p(\pi) = \rho(c_p(\pi'))

as conjugacy classes in \hat{G}.

To me it looks like the kind of trivial-but-not-trivially-trivial statement one proves in a basic algebra course, but obviously it is far more than that.

One quite nice thing that Arthur did was to draw an extended analogy with a situation that held in physics a century or so ago. It was observed that the absorption spectra of starlight had black lines where certain frequencies were absent, and these corresponded to the wavelengths emitted by familiar elements. This suggested that the chemistry of stars was similar to the chemistry on earth. Furthermore, because these absorption spectra were red-shifted to various extents, it also suggested that the stars were moving away from us, and ultimately suggested the Big Bang theory. However, exactly why these black lines appeared was a mystery, which was not solved until the formulation of quantum mechanics.

Something like this is how Arthur sees number theory today. Automorphic forms tell us about other number-theoretic worlds. Spectra come from differential equations that are quite similar to the Schrödinger equation — in particular, they are based on Laplacians — that come from the geometry of the special Riemannian manifolds I mentioned above. But exactly how the connection between the number theory and the spectral theory works is still a mystery.

To end on a rather different note, the one other thing I got out of this excellent talk was to see Gerhard “I was at ICM2014″ Paseman, of Mathoverflow fame. Later I even got to meet him, and he gave me a Mathoverflow teeshirt. I became aware of him because there were some small technical problems during the talk, and GP offered advice from the audience.


August 26, 2014

Geraint F. LewisSailing under the Magellanic Clouds: A DECam View of the Carina Dwarf


Where did that month go? Winter is almost over and spring will be breaking, and my backlog of papers to comment on is getting longer and longer.

So a quick post this morning on a recent cool paper by PhD student, Brendan McMonigal, called "Sailing under the Magellanic Clouds: A DECAm View of the Carina Dwarf". The title tells a lot of the story, but it all starts with a telescope with a big camera.

The camera is DECam, the Dark Energy Camera located on the 4m CTIO telescope in Chile. This is what it looks like;
It's not one CCD, but loads of them butted together allowing us to image a large chunk of sky. Over the next few years, this amazing camera will allow the Dark Energy Survey which will hopefully reveal what is going on in the dark sector of the Universe, a place where Australia will play a key-role through OzDES.

But one of the cool things is that we can use this superb facility to look at other things, and this is precisely what Bendan did. And the target was the Carina Dwarf Galaxy. Want to see this impressive beast! Here it is;
See it? It is there, but it's a dwarf galaxy, and is so quite faint. Still can't see it? Bring on DECam. We pointed DECam at Carina and took a picture. Well, a few. What did we see?
So, as you can see, we took 5 fields (in two colours) centred on the Carina dwarf. And with the superb properties of the camera, the dwarf galaxy nicely pops out.

But science is not simply about taking pictures, so we constructed colour-magnitude diagrams for each of the fields. Here's what we see (and thanks Brendan for constructing the handy key for the features in the bottom-right corner).
All that stuff in the labelled MW are stars in our own Milky Way, which is blooming contamination getting in our way. The blob at the bottom is where the we are hitting the observational limits of the camera, and can't really tell faint stars from galaxies.

The other bits labelled Young, Intermediate and Old tell us that Carina has had several bursts of star-formation during its life, some recent, some a little while ago, and some long ago (to express it in scientific terms), while the RGB is the Red Giant Branch, RC is the Red Clump and HB is the Horizontal Branch.

We can make maps of each of the Young, Intermediate and Old population stars, and what we see is this;
The Young and Intermediate appear to be quite elliptical and smooth, but the Old population appears to be a little ragged. This suggests that long ago, Carina has been shaken up through some gravitational shocks when it interacted with the larger galaxies of the Local Group, but the dynamics of these interactions are poorly understood.

But there is more. Look back up there at the Colour-Magnitude Diagram schematic and there is a little yellow wedge labelled LMC, the Large Magellanic Cloud; what's that doing there?

What do we see if we look at just those stars? Here's what we see.
So, they are not all over the place, but are located only in the southern field, overlapping with Carina itself (and making it difficult to separate the Old Carina population from the Magellanic Cloud stars).

But still, what are they doing there? Here's a rough map of the nearby galaxies.
As we can see, from the view inside the Milky Way, Carina and the LMC appear (very roughly) in the same patch of sky but are completely different distances. But it means that the Large Magellanic Cloud must have a large halo of stars surrounding it, possibly puffed up through interactions with the Small Magellanic Cloud as they orbit together, and with the mutual interaction with the Milky Way.

It's a mess, a glorious, horrendous, dynamically complicated mess. Wonderful!

Well done Brendan!

Sailing under the Magellanic Clouds: A DECam View of the Carina Dwarf

We present deep optical photometry from the DECam imager on the 4m Blanco telescope of over 12 deg[Math Processing Error] around the Carina dwarf spheroidal, with complete coverage out to 1 degree and partial coverage extending out to 2.6 degrees. Using a Poisson-based matched filter analysis to identify stars from each of the three main stellar populations, old, intermediate, and young, we confirm the previously identified radial age gradient, distance, tidal radius, stellar radial profiles, relative stellar population sizes, ellipticity, and position angle. We find an angular offset between the three main elliptical populations of Carina, and find only tentative evidence for tidal debris, suggesting that past tidal interactions could not have significantly influenced the Carina dwarf. We detect stars in the vicinity of, but distinct to, the Carina dwarf, and measure their distance to be 46[Math Processing Error]2 kpc. We determine this population to be part of the halo of the Large Magellanic Cloud at an angular radius of over 20 degrees. Due to overlap in colour-magnitude space with Magellanic stars, previously detected tidal features in the old population of Carina are likely weaker than previously thought.

Scott Aaronson“Could a Quantum Computer Have Subjective Experience?”

Author’s Note: Below is the prepared version of a talk that I gave two weeks ago at the workshop Quantum Foundations of a Classical Universe, which was held at IBM’s TJ Watson Research Center in Yorktown Heights, NY.  My talk is for entertainment purposes only; it should not be taken seriously by anyone.  If you reply in a way that makes clear you did take it seriously (“I’m shocked and outraged that someone who dares to call himself a scientist would … [blah blah]“), I will log your IP address, hunt you down at night, and force you to put forward an account of consciousness and decoherence that deals with all the paradoxes discussed below—and then reply at length to all criticisms of your account.

If you’d like to see titles, abstracts, and slides for all the talks from the workshop—including by Charles Bennett, Sean Carroll, James Hartle, Adrian Kent, Stefan Leichenauer, Ken Olum, Don Page, Jason Pollack, Jess Riedel, Mark Srednicki, Wojciech Zurek, and Michael Zwolak—click here.  You’re also welcome to discuss these other nice talks in the comments section, though I might or might not be able to answer questions about them.  Apparently videos of all the talks will be available before long (Jess Riedel has announced that videos are now available).

(Note that, as is probably true for other talks as well, the video of my talk differs substantially from the prepared version—it mostly just consists of interruptions and my responses to them!  On the other hand, I did try to work some of the more salient points from the discussion into the text below.)

Thanks so much to Charles Bennett and Jess Riedel for organizing the workshop, and to all the participants for great discussions.


I didn’t prepare slides for this talk—given the topic, what slides would I use exactly?  “Spoiler alert”: I don’t have any rigorous results about the possibility of sentient quantum computers, to state and prove on slides.  I thought of giving a technical talk on quantum computing theory, but then I realized that I don’t really have technical results that bear directly on the subject of the workshop, which is how the classical world we experience emerges from the quantum laws of physics.  So, given the choice between a technical talk that doesn’t really address the questions we’re supposed to be discussing, or a handwavy philosophical talk that at least tries to address them, I opted for the latter, so help me God.

Let me start with a story that John Preskill told me years ago.  In the far future, humans have solved not only the problem of building scalable quantum computers, but also the problem of human-level AI.  They’ve built a Turing-Test-passing quantum computer.  The first thing they do, to make sure this is actually a quantum computer, is ask it to use Shor’s algorithm to factor a 10,000-digit number.  So the quantum computer factors the number.  Then they ask it, “while you were factoring that number, what did it feel like?  did you feel yourself branching into lots of parallel copies, which then recohered?  or did you remain a single consciousness—a ‘unitary’ consciousness, as it were?  can you tell us from introspection which interpretation of quantum mechanics is the true one?”  The quantum computer ponders this for a while and then finally says, “you know, I might’ve known before, but now I just … can’t remember.”

I like to tell this story when people ask me whether the interpretation of quantum mechanics has any empirical consequences.

Look, I understand the impulse to say “let’s discuss the measure problem, or the measurement problem, or derivations of the Born rule, or Boltzmann brains, or observer-counting, or whatever, but let’s take consciousness off the table.”  (Compare: “let’s debate this state law in Nebraska that says that, before getting an abortion, a woman has to be shown pictures of cute babies.  But let’s take the question of whether or not fetuses have human consciousness—i.e., the actual thing that’s driving our disagreement about that and every other subsidiary question—off the table, since that one is too hard.”)  The problem, of course, is that even after you’ve taken the elephant off the table (to mix metaphors), it keeps climbing back onto the table, often in disguises.  So, for better or worse, my impulse tends to be the opposite: to confront the elephant directly.

Having said that, I still need to defend the claim that (a) the questions we’re discussing, centered around quantum mechanics, Many Worlds, and decoherence, and (b) the question of which physical systems should be considered “conscious,” have anything to do with each other.  Many people would say that the connection doesn’t go any deeper than: “quantum mechanics is mysterious, consciousness is also mysterious, ergo maybe they’re related somehow.”  But I’m not sure that’s entirely true.  One thing that crystallized my thinking about this was a remark made in a lecture by Peter Byrne, who wrote a biography of Hugh Everett.  Byrne was discussing the question, why did it take so many decades for Everett’s Many-Worlds Interpretation to become popular?  Of course, there are people who deny quantum mechanics itself, or who have basic misunderstandings about it, but let’s leave those people aside.  Why did people like Bohr and Heisenberg dismiss Everett?  More broadly: why wasn’t it just obvious to physicists from the beginning that “branching worlds” is a picture that the math militates toward, probably the simplest, easiest story one can tell around the Schrödinger equation?  Even if early quantum physicists rejected the Many-Worlds picture, why didn’t they at least discuss and debate it?

Here was Byrne’s answer: he said, before you can really be on board with Everett, you first need to be on board with Daniel Dennett (the philosopher).  He meant: you first need to accept that a “mind” is just some particular computational process.  At the bottom of everything is the physical state of the universe, evolving via the equations of physics, and if you want to know where consciousness is, you need to go into that state, and look for where computations are taking place that are sufficiently complicated, or globally-integrated, or self-referential, or … something, and that’s where the consciousness resides.  And crucially, if following the equations tells you that after a decoherence event, one computation splits up into two computations, in different branches of the wavefunction, that thereafter don’t interact—congratulations!  You’ve now got two consciousnesses.

And if everything above strikes you as so obvious as not to be worth stating … well, that’s a sign of how much things changed in the latter half of the 20th century.  Before then, many thinkers would’ve been more likely to say, with Descartes: no, my starting point is not the physical world.  I don’t even know a priori that there is a physical world.  My starting point is my own consciousness, which is the one thing besides math that I can be certain about.  And the point of a scientific theory is to explain features of my experience—ultimately, if you like, to predict the probability that I’m going to see X or Y if I do A or B.  (If I don’t have prescientific knowledge of myself, as a single, unified entity that persists in time, makes choices, and later observes their consequences, then I can’t even get started doing science.)  I’m happy to postulate a world external to myself, filled with unseen entities like electrons behaving in arbitrarily unfamiliar ways, if it will help me understand my experience—but postulating other versions of me is, at best, irrelevant metaphysics.  This is a viewpoint that could lead you Copenhagenism, or to its newer variants like quantum Bayesianism.

I’m guessing that many people in this room side with Dennett, and (not coincidentally, I’d say) also with Everett.  I certainly have sympathies in that direction too.  In fact, I spent seven or eight years of my life as a Dennett/Everett hardcore believer.  But, while I don’t want to talk anyone out of the Dennett/Everett view, I’d like to take you on a tour of what I see as some of the extremely interesting questions that that view leaves unanswered.  I’m not talking about “deep questions of meaning,” but about something much more straightforward: what exactly does a computational process have to do to qualify as “conscious”?

Of course, there are already tremendous difficulties here, even if we ignore quantum mechanics entirely.  Ken Olum was over much of this ground in his talk yesterday (see here for a relevant paper by Davenport and Olum).  You’ve all heard the ones about, would you agree to be painlessly euthanized, provided that a complete description of your brain would be sent to Mars as an email attachment, and a “perfect copy” of you would be reconstituted there?  Would you demand that the copy on Mars be up and running before the original was euthanized?  But what do we mean by “before”—in whose frame of reference?

Some people say: sure, none of this is a problem!  If I’d been brought up since childhood taking family vacations where we all emailed ourselves to Mars and had our original bodies euthanized, I wouldn’t think anything of it.  But the philosophers of mind are barely getting started.

There’s this old chestnut, what if each person on earth simulated one neuron of your brain, by passing pieces of paper around.  It took them several years just to simulate a single second of your thought processes.  Would that bring your subjectivity into being?  Would you accept it as a replacement for your current body?  If so, then what if your brain were simulated, not neuron-by-neuron, but by a gigantic lookup table?  That is, what if there were a huge database, much larger than the observable universe (but let’s not worry about that), that hardwired what your brain’s response was to every sequence of stimuli that your sense-organs could possibly receive.  Would that bring about your consciousness?  Let’s keep pushing: if it would, would it make a difference if anyone actually consulted the lookup table?  Why can’t it bring about your consciousness just by sitting there doing nothing?

To these standard thought experiments, we can add more.  Let’s suppose that, purely for error-correction purposes, the computer that’s simulating your brain runs the code three times, and takes the majority vote of the outcomes.  Would that bring three “copies” of your consciousness into being?  Does it make a difference if the three copies are widely separated in space or time—say, on different planets, or in different centuries?  Is it possible that the massive redundancy taking place in your brain right now is bringing multiple copies of you into being?

Maybe my favorite thought experiment along these lines was invented by my former student Andy Drucker.  In the past five years, there’s been a revolution in theoretical cryptography, around something called Fully Homomorphic Encryption (FHE), which was first discovered by Craig Gentry.  What FHE lets you do is to perform arbitrary computations on encrypted data, without ever decrypting the data at any point.  So, to someone with the decryption key, you could be proving theorems, simulating planetary motions, etc.  But to someone without the key, it looks for all the world like you’re just shuffling random strings and producing other random strings as output.

You can probably see where this is going.  What if we homomorphically encrypted a simulation of your brain?  And what if we hid the only copy of the decryption key, let’s say in another galaxy?  Would this computation—which looks to anyone in our galaxy like a reshuffling of gobbledygook—be silently producing your consciousness?

When we consider the possibility of a conscious quantum computer, in some sense we inherit all the previous puzzles about conscious classical computers, but then also add a few new ones.  So, let’s say I run a quantum subroutine that simulates your brain, by applying some unitary transformation U.  But then, of course, I want to “uncompute” to get rid of garbage (and thereby enable interference between different branches), so I apply U-1.  Question: when I apply U-1, does your simulated brain experience the same thoughts and feelings a second time?  Is the second experience “the same as” the first, or does it differ somehow, by virtue of being reversed in time?  Or, since U-1U is just a convoluted implementation of the identity function, are there no experiences at all here?

Here’s a better one: many of you have heard of the Vaidman bomb.  This is a famous thought experiment in quantum mechanics where there’s a package, and we’d like to “query” it to find out whether it contains a bomb—but if we query it and there is a bomb, it will explode, killing everyone in the room.  What’s the solution?  Well, suppose we could go into a superposition of querying the bomb and not querying it, with only ε amplitude on querying the bomb, and √(1-ε2) amplitude on not querying it.  And suppose we repeat this over and over—each time, moving ε amplitude onto the “query the bomb” state if there’s no bomb there, but moving ε2 probability onto the “query the bomb” state if there is a bomb (since the explosion decoheres the superposition).  Then after 1/ε repetitions, we’ll have order 1 probability of being in the “query the bomb” state if there’s no bomb.  By contrast, if there is a bomb, then the total probability we’ve ever entered that state is (1/ε)×ε2 = ε.  So, either way, we learn whether there’s a bomb, and the probability that we set the bomb off can be made arbitrarily small.  (Incidentally, this is extremely closely related to how Grover’s algorithm works.)

OK, now how about the Vaidman brain?  We’ve got a quantum subroutine simulating your brain, and we want to ask it a yes-or-no question.  We do so by querying that subroutine with ε amplitude 1/ε times, in such a way that if your answer is “yes,” then we’ve only ever activated the subroutine with total probability ε.  Yet you still manage to communicate your “yes” answer to the outside world.  So, should we say that you were conscious only in the ε fraction of the wavefunction where the simulation happened, or that the entire system was conscious?  (The answer could matter a lot for anthropic purposes.)

You might say, sure, maybe these questions are puzzling, but what’s the alternative?  Either we have to say that consciousness is a byproduct of any computation of the right complexity, or integration, or recursiveness (or something) happening anywhere in the wavefunction of the universe, or else we’re back to saying that beings like us are conscious, and all these other things aren’t, because God gave the souls to us, so na-na-na.  Or I suppose we could say, like the philosopher John Searle, that we’re conscious, and the lookup table and homomorphically-encrypted brain and Vaidman brain and all these other apparitions aren’t, because we alone have “biological causal powers.”  And what do those causal powers consist of?  Hey, you’re not supposed to ask that!  Just accept that we have them.  Or we could say, like Roger Penrose, that we’re conscious and the other things aren’t because we alone have microtubules that are sensitive to uncomputable effects from quantum gravity.  But neither of those two options ever struck me as much of an improvement.

Yet I submit to you that, between these extremes, there’s another position we can stake out—one that I certainly don’t know to be correct, but that would solve so many different puzzles if it were correct that, for that reason alone, it seems to me to merit more attention than it usually receives.  (In an effort to give the view that attention, a couple years ago I wrote an 85-page essay called The Ghost in the Quantum Turing Machine, which one or two people told me they actually read all the way through.)  If, after a lifetime of worrying (on weekends) about stuff like whether a giant lookup table would be conscious, I now seem to be arguing for this particular view, it’s less out of conviction in its truth than out of a sense of intellectual obligation: to whatever extent people care about these slippery questions at all, to whatever extent they think various alternative views deserve a hearing, I believe this one does as well.

The intermediate position that I’d like to explore says the following.  Yes, consciousness is a property of any suitably-organized chunk of matter.  But, in addition to performing complex computations, or passing the Turing Test, or other information-theoretic conditions that I don’t know (and don’t claim to know), there’s at least one crucial further thing that a chunk of matter has to do before we should consider it conscious.  Namely, it has to participate fully in the Arrow of Time.  More specifically, it has to produce irreversible decoherence as an intrinsic part of its operation.  It has to be continually taking microscopic fluctuations, and irreversibly amplifying them into stable, copyable, macroscopic classical records.

Before I go further, let me be extremely clear about what this view is not saying.  Firstly, it’s not saying that the brain is a quantum computer, in any interesting sense—let alone a quantum-gravitational computer, like Roger Penrose wants!  Indeed, I see no evidence, from neuroscience or any other field, that the cognitive information processing done by the brain is anything but classical.  The view I’m discussing doesn’t challenge conventional neuroscience on that account.

Secondly, this view doesn’t say that consciousness is in any sense necessary for decoherence, or for the emergence of a classical world.  I’ve never understood how one could hold such a belief, while still being a scientific realist.  After all, there are trillions of decoherence events happening every second in stars and asteroids and uninhabited planets.  Do those events not “count as real” until a human registers them?  (Or at least a frog, or an AI?)  The view I’m discussing only asserts the converse: that decoherence is necessary for consciousness.  (By analogy, presumably everyone agrees that some amount of computation is necessary for an interesting consciousness, but that doesn’t mean consciousness is necessary for computation.)

Thirdly, the view I’m discussing doesn’t say that “quantum magic” is the explanation for consciousness.  It’s silent on the explanation for consciousness (to whatever extent that question makes sense); it seeks only to draw a defensible line between the systems we want to regard as conscious and the systems we don’t—to address what I recently called the Pretty-Hard Problem.  And the (partial) answer it suggests doesn’t seem any more “magical” to me than any other proposed answer to the same question.  For example, if one said that consciousness arises from any computation that’s sufficiently “integrated” (or something), I could reply: what’s the “magical force” that imbues those particular computations with consciousness, and not other computations I can specify?  Or if one said (like Searle) that consciousness arises from the biology of the brain, I could reply: so what’s the “magic” of carbon-based biology, that could never be replicated in silicon?  Or even if one threw up one’s hands and said everything was conscious, I could reply: what’s the magical power that imbues my stapler with a mind?  Each of these views, along with the view that stresses the importance of decoherence and the arrow of time, is worth considering.  In my opinion, each should be judged according to how well it holds up under the most grueling battery of paradigm-cases, thought experiments, and reductios ad absurdum we can devise.

So, why might one conjecture that decoherence, and participation in the arrow of time, were necessary conditions for consciousness?  I suppose I could offer some argument about our subjective experience of the passage of time being a crucial component of our consciousness, and the passage of time being bound up with the Second Law.  Truthfully, though, I don’t have any a-priori argument that I find convincing.  All I can do is show you how many apparent paradoxes get resolved if you make this one speculative leap.

For starters, if you think about exactly how our chunk of matter is going to amplify microscopic fluctuations, it could depend on details like the precise spin orientations of various subatomic particles in the chunk.  But that has an interesting consequence: if you’re an outside observer who doesn’t know the chunk’s quantum state, it might be difficult or impossible for you to predict what the chunk is going to do next—even just to give decent statistical predictions, like you can for a hydrogen atom.  And of course, you can’t in general perform a measurement that will tell you the chunk’s quantum state, without violating the No-Cloning Theorem.  For the same reason, there’s in general no physical procedure that you can apply to the chunk to duplicate it exactly: that is, to produce a second chunk that you can be confident will behave identically (or almost identically) to the first, even just in a statistical sense.  (Again, this isn’t assuming any long-range quantum coherence in the chunk: only microscopic coherence that then gets amplified.)

It might be objected that there are all sorts of physical systems that “amplify microscopic fluctuations,” but that aren’t anything like what I described, at least not in any interesting sense: for example, a Geiger counter, or a photodetector, or any sort of quantum-mechanical random-number generator.  You can make, if not an exact copy of a Geiger counter, surely one that’s close enough for practical purposes.  And, even though the two counters will record different sequences of clicks when pointed at identical sources, the statistical distribution of clicks will be the same (and precisely calculable), and surely that’s all that matters.  So, what separates these examples from the sorts of examples I want to discuss?

What separates them is the undisputed existence of what I’ll call a clean digital abstraction layer.  By that, I mean a macroscopic approximation to a physical system that an external observer can produce, in principle, without destroying the system; that can be used to predict what the system will do to excellent accuracy (given knowledge of the environment); and that “sees” quantum-mechanical uncertainty—to whatever extent it does—as just a well-characterized source of random noise.  If a system has such an abstraction layer, then we can regard any quantum noise as simply part of the “environment” that the system observes, rather than part of the system itself.  I’ll take it as clear that such clean abstraction layers exist for a Geiger counter, a photodetector, or a computer with a quantum random number generator.  By contrast, for (say) an animal brain, I regard it as currently an open question whether such an abstraction layer exists or not.  If, someday, it becomes routine for nanobots to swarm through people’s brains and make exact copies of them—after which the “original” brains can be superbly predicted in all circumstances, except for some niggling differences that are traceable back to different quantum-mechanical dice rolls—at that point, perhaps educated opinion will have shifted to the point where we all agree the brain does have a clean digital abstraction layer.  But from where we stand today, it seems entirely possible to agree that the brain is a physical system obeying the laws of physics, while doubting that the nanobots would work as advertised.  It seems possible that—as speculated by Bohr, Compton, Eddington, and even Alan Turing—if you want to get it right you’ll need more than just the neural wiring graph, the synaptic strengths, and the approximate neurotransmitter levels.  Maybe you also need (e.g.) the internal states of the neurons, the configurations of sodium-ion channels, or other data that you simply can’t get without irreparably damaging the original brain—not only as a contingent matter of technology but as a fundamental matter of physics.

(As a side note, I should stress that obviously, even without invasive nanobots, our brains are constantly changing, but we normally don’t say as a result that we become completely different people at each instant!  To my way of thinking, though, this transtemporal identity is fundamentally different from a hypothetical identity between different “copies” of you, in the sense we’re talking about.  For one thing, all your transtemporal doppelgängers are connected by a single, linear chain of causation.  For another, outside movies like Bill and Ted’s Excellent Adventure, you can’t meet your transtemporal doppelgängers and have a conversation with them, nor can scientists do experiments on some of them, then apply what they learned to others that remained unaffected by their experiments.)

So, on this view, a conscious chunk of matter would be one that not only acts irreversibly, but that might well be unclonable for fundamental physical reasons.  If so, that would neatly resolve many of the puzzles that I discussed before.  So for example, there’s now a straightforward reason why you shouldn’t consent to being killed, while your copy gets recreated on Mars from an email attachment.  Namely, that copy will have a microstate with no direct causal link to your “original” microstate—so while it might behave similarly to you in many ways, you shouldn’t expect that your consciousness will “transfer” to it.  If you wanted to get your exact microstate to Mars, you could do that in principle using quantum teleportation—but as we all know, quantum teleportation inherently destroys the original copy, so there’s no longer any philosophical problem!  (Or, of course, you could just get on a spaceship bound for Mars: from a philosophical standpoint, it amounts to the same thing.)

Similarly, in the case where the simulation of your brain was run three times for error-correcting purposes: that could bring about three consciousnesses if, and only if, the three simulations were tied to different sets of decoherence events.  The giant lookup table and the Earth-sized brain simulation wouldn’t bring about any consciousness, unless they were implemented in such a way that they no longer had a clean digital abstraction layer.  What about the homomorphically-encrypted brain simulation?  That might no longer work, simply because we can’t assume that the microscopic fluctuations that get amplified are homomorphically encrypted.  Those are “in the clear,” which inevitably leaks information.  As for the quantum computer that simulates your thought processes and then perfectly reverses the simulation, or that queries you like a Vaidman bomb—in order to implement such things, we’d of course need to use quantum fault-tolerance, so that the simulation of you stayed in an encoded subspace and didn’t decohere.  But under our assumption, that would mean the simulation wasn’t conscious.

Now, it might seem to some of you like I’m suggesting something deeply immoral.  After all, the view I’m considering implies that, even if a system passed the Turing Test, and behaved identically to a human, even if it eloquently pleaded for its life, if it wasn’t irreversibly decohering microscopic events then it wouldn’t be conscious, so it would be fine to kill it, torture it, whatever you want.

But wait a minute: if a system isn’t doing anything irreversible, then what exactly does it mean to “kill” it?  If it’s a classical computation, then at least in principle, you could always just restore from backup.  You could even rewind and not only erase the memories of, but “uncompute” (“untorture”?) whatever tortures you had performed.  If it’s a quantum computation, you could always invert the unitary transformation U that corresponded to killing the thing (then reapply U and invert it again for good measure, if you wanted).  Only for irreversible systems are there moral acts with irreversible consequences.

This is related to something that’s bothered me for years in quantum foundations.  When people discuss Schrödinger’s cat, they always—always—insert some joke about, “obviously, this experiment wouldn’t pass the Ethical Review Board.  Nowadays, we try to avoid animal cruelty in our quantum gedankenexperiments.”  But actually, I claim that there’s no animal cruelty at all in the Schrödinger’s cat experiment.  And here’s why: in order to prove that the cat was ever in a coherent superposition of |Alive〉 and |Dead〉, you need to be able to measure it in a basis like {|Alive〉+|Dead〉,|Alive〉-|Dead〉}.  But if you can do that, you must have such precise control over all the cat’s degrees of freedom that you can also rotate unitarily between the |Alive〉 and |Dead〉 states.  (To see this, let U be the unitary that you applied to the |Alive〉 branch, and V the unitary that you applied to the |Dead〉 branch, to bring them into coherence with each other; then consider applying U-1V.)  But if you can do that, then in what sense should we say that the cat in the |Dead〉 state was ever “dead” at all?  Normally, when we speak of “killing,” we mean doing something irreversible—not rotating to some point in a Hilbert space that we could just as easily rotate away from.

(There followed discussion among some audience members about the question of whether, if you destroyed all records of some terrible atrocity, like the Holocaust, everywhere in the physical world, you would thereby cause the atrocity “never to have happened.”  Many people seemed surprised by my willingness to accept that implication of what I was saying.  By way of explaining, I tried to stress just how far our everyday, intuitive notion of “destroying all records of something” falls short of what would actually be involved here: when we think of “destroying records,” we think about burning books, destroying the artifacts in museums, silencing witnesses, etc.  But even if all those things were done and many others, still the exact configurations of the air, the soil, and photons heading away from the earth at the speed of light would retain their silent testimony to the Holocaust’s reality.  “Erasing all records” in the physics sense would be something almost unimaginably more extreme: it would mean inverting the entire physical evolution in the vicinity of the earth, stopping time’s arrow and running history itself backwards.  Such ‘unhappening’ of what’s happened is something that we lack any experience of, at least outside of certain quantum interference experiments—though in the case of the Holocaust, one could be forgiven for wishing it were possible.)

OK, so much for philosophy of mind and morality; what about the interpretation of quantum mechanics?  If we think about consciousness in the way I’ve suggested, then who’s right: the Copenhagenists or the Many-Worlders?  You could make a case for either.  The Many-Worlders would be right that we could always, if we chose, think of decoherence events as “splitting” our universe into multiple branches, each with different versions of ourselves, that thereafter don’t interact.  On the other hand, the Copenhagenists would be right that, even in principle, we could never do any experiment where this “splitting” of our minds would have any empirical consequence.  On this view, if you can control a system well enough that you can actually observe interference between the different branches, then it follows that you shouldn’t regard the system as conscious, because it’s not doing anything irreversible.

In my essay, the implication that concerned me the most was the one for “free will.”  If being conscious entails amplifying microscopic events in an irreversible and unclonable way, then someone looking at a conscious system from the outside might not, in general, be able to predict what it’s going to do next, not even probabilistically.  In other words, its decisions might be subject to at least some “Knightian uncertainty”: uncertainty that we can’t even quantify in a mutually-agreed way using probabilities, in the same sense that we can quantify our uncertainty about (say) the time of a radioactive decay.  And personally, this is actually the sort of “freedom” that interests me the most.  I don’t really care if my choices are predictable by God, or by a hypothetical Laplace demon: that is, if they would be predictable (at least probabilistically), given complete knowledge of the microstate of the universe.  By definition, there’s essentially no way for my choices not to be predictable in that weak and unempirical sense!  On the other hand, I’d prefer that my choices not be completely predictable by other people.  If someone could put some sheets of paper into a sealed envelope, then I spoke extemporaneously for an hour, and then the person opened the envelope to reveal an exact transcript of everything I said, that’s the sort of thing that really would cause me to doubt in what sense “I” existed as a locus of thought.  But you’d have to actually do the experiment (or convince me that it could be done): it doesn’t count just to talk about it, or to extrapolate from fMRI experiments that predict which of two buttons a subject is going to press with 60% accuracy a few seconds in advance.

But since we’ve got some cosmologists in the house, let me now turn to discussing the implications of this view for Boltzmann brains.

(For those tuning in from home: a Boltzmann brain is a hypothetical chance fluctuation in the late universe, which would include a conscious observer with all the perceptions that a human being—say, you—is having right now, right down to false memories and false beliefs of having arisen via Darwinian evolution.  On statistical grounds, the overwhelming majority of Boltzmann brains last just long enough to have a single thought—like, say, the one you’re having right now—before they encounter the vacuum and freeze to death.  If you measured some part of the vacuum state toward which our universe seems to be heading, asking “is there a Boltzmann brain here?,” quantum mechanics predicts that the probability would be ridiculously astronomically small, but nonzero.  But, so the argument goes, if the vacuum lasts for infinite time, then as long as the probability is nonzero, it doesn’t matter how tiny it is: you’ll still get infinitely many Boltzmann brains indistinguishable from any given observer; and for that reason, any observer should consider herself infinitely likelier to be a Boltzmann brain than to be the “real,” original version.  For the record, even among the strange people at the IBM workshop, no one actually worried about being a Boltzmann brain.  The question, rather, is whether, if a cosmological model predicts Boltzmann brains, then that’s reason enough to reject the model, or whether we can live with such a prediction, since we have independent grounds for knowing that we can’t be Boltzmann brains.)

At this point, you can probably guess where this is going.  If decoherence, entropy production, full participation in the arrow of time are necessary conditions for consciousness, then it would follow, in particular, that a Boltzmann brain is not conscious.  So we certainly wouldn’t be Boltzmann brains, even under a cosmological model that predicts infinitely more of them than of us.  We can wipe our hands; the problem is solved!

I find it extremely interesting that, in their recent work, Kim Boddy, Sean Carroll, and Jason Pollack reached a similar conclusion, but from a completely different starting point.  They said: look, under reasonable assumptions, the late universe is just going to stay forever in an energy eigenstate—just sitting there doing nothing.  It’s true that, if someone came along and measured the energy eigenstate, asking “is there a Boltzmann brain here?,” then with a tiny but nonzero probability the answer would be yes.  But since no one is there measuring, what licenses us to interpret the nonzero overlap in amplitude with the Boltzmann brain state, as a nonzero probability of there being a Boltzmann brain?  I think they, too, are implicitly suggesting: if there’s no decoherence, no arrow of time, then we’re not authorized to say that anything is happening that “counts” for anthropic purposes.

Let me now mention an obvious objection.  (In fact, when I gave the talk, this objection was raised much earlier.)  You might say, “look, if you really think irreversible decoherence is a necessary condition for consciousness, then you might find yourself forced to say that there’s no consciousness, because there might not be any such thing as irreversible decoherence!  Imagine that our entire solar system were enclosed in an anti de Sitter (AdS) boundary, like in Greg Egan’s science-fiction novel Quarantine.  Inside the box, there would just be unitary evolution in some Hilbert space: maybe even a finite-dimensional Hilbert space.  In which case, all these ‘irreversible amplifications’ that you lay so much stress on wouldn’t be irreversible at all: eventually all the Everett branches would recohere; in fact they’d decohere and recohere infinitely many times.  So by your lights, how could anything be conscious inside the box?”

My response to this involves one last speculation.  I speculate that the fact that we don’t appear to live in AdS space—that we appear to live in (something evolving toward) a de Sitter space, with a positive cosmological constant—might be deep and important and relevant.  I speculate that, in our universe, “irreversible decoherence” means: the records of what you did are now heading toward our de Sitter horizon at the speed of light, and for that reason alone—even if for no others—you can’t put Humpty Dumpty back together again.  (Here I should point out, as several workshop attendees did to me, that Bousso and Susskind explored something similar in their paper The Multiverse Interpretation of Quantum Mechanics.)

Does this mean that, if cosmologists discover tomorrow that the cosmological constant is negative, or will become negative, then it will turn out that none of us were ever conscious?  No, that’s stupid.  What it would suggest is that the attempt I’m now making on the Pretty-Hard Problem had smacked into a wall (an AdS wall?), so that I, and anyone else who stressed in-principle irreversibility, should go back to the drawing board.  (By analogy, if some prescription for getting rid of Boltzmann brains fails, that doesn’t mean we are Boltzmann brains; it just means we need a new prescription.  Tempting as it is to skewer our opponents’ positions with these sorts of strawman inferences, I hope we can give each other the courtesy of presuming a bare minimum of sense.)

Another question: am I saying that, in order to be absolutely certain of whether some entity satisfied the postulated precondition for consciousness, one might, in general, need to look billions of years into the future, to see whether the “decoherence” produced by the entity was really irreversible?  Yes (pause to gulp bullet).  I am saying that.  On the other hand, I don’t think it’s nearly as bad as it sounds.  After all, the category of “consciousness” might be morally relevant, or relevant for anthropic reasoning, but presumably we all agree that it’s unlikely to play any causal role in the fundamental laws of physics.  So it’s not as if we’ve introduced any teleology into the laws of physics by this move.

Let me end by pointing out what I’ll call the “Tegmarkian slippery slope.”  It feels scientific and rational—from the perspective of many of us, even banal—to say that, if we’re conscious, then any sufficiently-accurate computer simulation of us would also be.  But I tried to convince you that this view depends, for its aura of obviousness, on our agreeing not to probe too closely exactly what would count as a “sufficiently-accurate” simulation.  E.g., does it count if the simulation is done in heavily-encrypted form, or encoded as a giant lookup table?  Does it matter if anyone actually runs the simulation, or consults the lookup table?  Now, all the way at the bottom of the slope is Max Tegmark, who asks: to produce consciousness, what does it matter if the simulation is physically instantiated at all?  Why isn’t it enough for the simulation to “exist” mathematically?  Or, better yet: if you’re worried about your infinitely-many Boltzmann brain copies, then why not worry equally about the infinitely many descriptions of your life history that are presumably encoded in the decimal expansion of π?  Why not hold workshops about how to avoid the prediction that we’re infinitely likelier to be “living in π” than to be our “real” selves?

From this extreme, even most scientific rationalists recoil.  They say, no, even if we don’t yet know exactly what’s meant by “physical instantiation,” we agree that you only get consciousness if the computer program is physically instantiated somehow.  But now I have the opening I want.  I can say: once we agree that physical existence is a prerequisite for consciousness, why not participation in the Arrow of Time?  After all, our ordinary ways of talking about sentient beings—outside of quantum mechanics, cosmology, and maybe theology—don’t even distinguish between the concepts “exists” and “exists and participates in the Arrow of Time.”  And to say we have no experience of reversible, clonable, coherently-executable, atemporal consciousnesses is a massive understatement.

Of course, we should avoid the sort of arbitrary prejudice that Turing warned against in Computing Machinery and Intelligence.  Just because we lack experience with extraterrestrial consciousnesses, doesn’t mean it would be OK to murder an intelligent extraterrestrial if we met one tomorrow.  In just the same way, just because we lack experience with clonable, atemporal consciousnesses, doesn’t mean it would be OK to … wait!  As we said before, clonability, and aloofness from time’s arrow, call severely into question what it even means to “murder” something.  So maybe this case isn’t as straightforward as the extraterrestrials after all.

At this point, I’ve probably laid out enough craziness, so let me stop and open things up for discussion.

David Hoggdata science, data-driven models

Today was my last day in Heidelberg for 2014. Sad day! I spent most of my research time working with Ness on our data-driven models of stars. We got the code to go from first-order (linear) linear fitting to second-order (quadratic) linear fitting, and vowed that we would never go to third order! We will use a Gaussian Process before we do that. We discussed outliers—both outlier stars and outlier pixels—and how to deal with them while still retaining computational tractability, good probability of convergence to a good optimum (or good sampling prospects), and the likelihood principle. We don't have a good solution yet, although I have ideas in my old document on fitting a line. We also discussed next steps on the path to paper 1, including especially making figures that demonstrate the properties and qualities of the individual-wavelength least-square fits we are doing.

At the end of the day, I was noticing that this project is a perfect example of the new discipline of "Data Science", about which I am supposed to have opinions: It was an idea we both had, but I didn't know enough about the spectroscopy, and Ness didn't know enough about the methods available. Neither of us could have done this project without the collaboration (nor without significant amounts of computation).

Quantum DiariesDo we live in a 2-D hologram?

This Fermilab press release was published on Aug. 26, 2014.

A Fermilab scientist works on the laser beams at the heart of the Holometer experiment. The Holometer will use twin laser interferometers to test whether the universe is a 2-D hologram. Photo: Fermilab

A Fermilab scientist works on the laser beams at the heart of the Holometer experiment. The Holometer will use twin laser interferometers to test whether the universe is a 2-D hologram. Photo: Fermilab

A unique experiment at the U.S. Department of Energy’s Fermi National Accelerator Laboratory called the Holometer has started collecting data that will answer some mind-bending questions about our universe – including whether we live in a hologram.

Much like characters on a television show would not know that their seemingly 3-D world exists only on a 2-D screen, we could be clueless that our 3-D space is just an illusion. The information about everything in our universe could actually be encoded in tiny packets in two dimensions.

Get close enough to your TV screen and you’ll see pixels, small points of data that make a seamless image if you stand back. Scientists think that the universe’s information may be contained in the same way and that the natural “pixel size” of space is roughly 10 trillion trillion times smaller than an atom, a distance that physicists refer to as the Planck scale.

“We want to find out whether space-time is a quantum system just like matter is,” said Craig Hogan, director of Fermilab’s Center for Particle Astrophysics and the developer of the holographic noise theory. “If we see something, it will completely change ideas about space we’ve used for thousands of years.”

Quantum theory suggests that it is impossible to know both the exact location and the exact speed of subatomic particles. If space comes in 2-D bits with limited information about the precise location of objects, then space itself would fall under the same theory of uncertainty. The same way that matter continues to jiggle (as quantum waves) even when cooled to absolute zero, this digitized space should have built-in vibrations even in its lowest energy state.

Essentially, the experiment probes the limits of the universe’s ability to store information. If there is a set number of bits that tell you where something is, it eventually becomes impossible to find more specific information about the location – even in principle. The instrument testing these limits is Fermilab’s Holometer, or holographic interferometer, the most sensitive device ever created to measure the quantum jitter of space itself.

Now operating at full power, the Holometer uses a pair of interferometers placed close to one another. Each one sends a one-kilowatt laser beam (the equivalent of 200,000 laser pointers) at a beam splitter and down two perpendicular 40-meter arms. The light is then reflected back to the beam splitter where the two beams recombine, creating fluctuations in brightness if there is motion. Researchers analyze these fluctuations in the returning light to see if the beam splitter is moving in a certain way – being carried along on a jitter of space itself.

“Holographic noise” is expected to be present at all frequencies, but the scientists’ challenge is not to be fooled by other sources of vibrations. The Holometer is testing a frequency so high – millions of cycles per second – that motions of normal matter are not likely to cause problems. Rather, the dominant background noise is more often due to radio waves emitted by nearby electronics. The Holometer experiment is designed to identify and eliminate noise from such conventional sources.

“If we find a noise we can’t get rid of, we might be detecting something fundamental about nature – a noise that is intrinsic to space-time,” said Fermilab physicist Aaron Chou, lead scientist and project manager for the Holometer. “It’s an exciting moment for physics. A positive result will open a whole new avenue of questioning about how space works.”

The Holometer experiment, funded by the U.S. Department of Energy Office of Science and other sources, is expected to gather data over the coming year.

The Holometer team comprises 21 scientists and students from Fermilab, the Massachusetts Institute of Technology, the University of Chicago and the University of Michigan. For more information about the experiment, visit http://holometer.fnal.gov/.

Fermilab is America’s premier national laboratory for particle physics and accelerator research. A U.S. Department of Energy Office of Science laboratory, Fermilab is located near Chicago, Illinois, and operated under contract by the Fermi Research Alliance, LLC. Visit Fermilab’s website at www.fnal.gov and follow us on Twitter at @FermilabToday.

The DOE Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Tim GowersICM2014 — Barak, Guralnick, Brown

Here’s a little puzzle to get this post started. Of the fourteen 21st-century Fields medallists (if you count Perelman), seven — Lafforgue, Voevodsky, Tao, Werner, Smirnov, Avila and Mirzakhani — have something interesting in common that the others lack. What is this property?

That’s a fairly easy question, so let’s follow it up with another one: how surprised should we be about this? Is there unconscious bias towards mathematicians with this property? Of this year’s 21 plenary lecturers, the only one with the property was Mirzakhani, and out of the 20 plenary lecturers in 2010, the only one with the property was Avila. What is going on?

On to more serious matters. After Candès’s lecture I had a solitary lunch in the subterranean mall (Korean food of some description, but I’ve forgotten exactly what) and went to hear Martin Hairer deliver his Fields medal lecture, which I’m not going to report on because I don’t have much more to say about his work than I’ve already said.

By and large, the organization of the congress was notably good — for example, I almost never had to queue for anything, and never for any length of time — but there was a little lapse this afternoon, in that Hairer’s lecture was scheduled to finish at 3pm, exactly the time that the afternoon’s parallel sessions started. In some places that might have been OK, but not in the vast COEX Seoul conference centre. I had to get from the main hall to a room at the other end of the centre where theoretical computer science talks were taking place, which was probably about as far as walking from my house in Cambridge to the railway station. (OK, I live close to the station, but even so.)

Inevitably, therefore, I arrived late to Boaz Barak’s talk, but he welcomed me, and a few others in my position, with the reassuring words that everything he had said up to now was bullshit and we didn’t need to worry about it. (He was quoting a juggler he had seen in Washington Square.)

I always like it when little themes recur at ICMs in different contexts. I’ve already mentioned the theme of looking at big spaces of objects in order to understand typical objects. Another one I mentioned when describing Candès’s lecture: that one should not necessarily be afraid of NP-complete problems, a theme which was present in Barak’s talk as well. I’m particularly fond of it because I’ve spent a lot of time in the last few years thinking about the well-known NP-complete problem where the input is a mathematical statement and the task (in the decision version) is to say whether there is a proof of that statement of length at most n — in some appropriate formal system. The fact that this problem is NP-complete does not deter mathematicians from spending their lives solving instances of it. What explains this apparent success? I dream that there might be a very nice answer to this question, rather than just a hand-wavy one that says that the instances studied by mathematicians are far from general.

Barak was talking about something a little different, however. He too has a dream, which is to obtain a very precise understanding of why certain problems are hard (in the complexity sense of not being soluble with efficient algorithms) and others easy. He is not satisfied with mere lists of easy and hard problems, with algorithms for the former and reductions to NP-complete or other “known hard” problems for the latter. He wants a theory that will say which problems are hard and which easy, or at least do that for large classes of problems. And the way he wants to do it is to find “meta-algorithms” — which roughly speaking means very general algorithmic approaches with the property that if they work then the problem is easy and if they fail then it’s hard.

Why is there the slightest reason to think that that can be done? Isn’t there a wide variety of algorithms, each of which requires a lot of ingenuity to find? If one approach fails, might there not be some clever alternative approach that nobody had thought of?

These are all perfectly reasonable objections, but the message, or at least a message, of Barak’s talk is that it is not completely outlandish to think that it really is the case that there is what one might call a “best possible” meta-algorithm, in the sense that if it fails, then nothing else can succeed. Again, I stress that this would be for large and interesting classes of algorithm problems (e.g. certain optimization problems) and not for every single describable Boolean function. One reason to hold out hope is that if you delve a little more deeply into the algorithms we know about, you find that actually many of them are based on just a few ideas, such as linear and semidefinite programming, solving simultaneous linear equations, and so on. Of course, that could just reflect our lack of imagination, but it could be an indication that something deeper is going on.

Another reason for optimism is that he has a candidate: the sum-of-squares algorithm. This is connected with Hilbert’s 17th problem, which asked whether every multivariate polynomial that takes only non-negative values can be written as a sum of squares of rational functions. (It turns out that they can’t necessarily be written as a sum of squares of polynomials: a counterexample is z^6+x^4y^2+x^2y^4-3x^2y^2z^2.) An interesting algorithmic problem is to write a polynomial as a sum of squares when it can be so written. One of the reasons this problem interests Barak is that many other problems can be reduced to it. Another, which I don’t properly understand but I think would understand if I watched his talk again (it is here, by the way), is that if the unique games conjecture is false, and recall that it too is sort of saying that a certain algorithm is best possible, then the sum-of-squares algorithm is waiting in the wings to take over as the new candidate that will do the job.

An unfortunate aspect of going to Barak’s talk was that I missed Harald Helfgott’s. However, the sacrifice was rewarded, and I can always watch Harald on Youtube.

After another longish walk, but with a non-zero amount of time for it, I arrived at my next talk of the afternoon, given by Bob Guralnick. This was another very nice talk, just what an ICM invited lecture should be like. (By that I mean that it should be aimed principally at non-experts, while at the same time conveying what has been going on recently in the field. In other words, it should be more of a colloquium style talk than a research seminar.)

Guralnick’s title was Applications of the Classification of Finite Simple Groups. One thing he did was talk about the theorem itself, how a proof was announced in 1983 but not actually completed for another twenty years, and how there are now — or will be soon — “second-generation” proofs that are shorter, though still long, and use new ideas. He also mentioned a few statements that can be proved with the classification theorem and are seemingly completely out of reach without it. Here are a few of them.

1. Every finite simple group is generated by two elements.

2. The probability that a random pair of elements generates a finite simple group tends to 1 as the size of the group tends to infinity.

3. For every non-identity element x of a finite simple group, there exists an element y such that x and y generate the group.

4. For every finite simple group there exist conjugacy classes C and D such that for every c\in C and every d\in D the elements c and d generate the group.

Why does the classification of finite simple groups help with these problems? Because it means that instead of having to give an abstract proof that somehow uses the condition of having no proper normal subgroups, you have the option of doing a proof that involves calculations in concrete groups. Because the list of (families of) groups you have to consider is finite, this is a feasible approach. Actually, it’s not just that there are only finitely many families, but also that the families themselves are very nice, especially the various families of Lie type. As far as I can tell from the relevant Wikipedia article, there isn’t a formal definition of “group of Lie type”, but basically it means a group that’s like a Lie group but defined over a finite field instead of over $\mathbb{R}$ or $\mathbb{C}$. So things like PSL$(2,q)$ are finite simple groups of Lie type.

Just as the geometrization theorem didn’t kill off research in 3-manifolds, the classification of finite simple groups didn’t kill off group theory, even though in the past many mathematicians have thought that it would. It’s easy to see how that perception might have arisen: the project of classifying finite simple groups became such a major focus for group theorists that once it was done, a huge chunk of what they were engaged in was no longer available.

So what’s left? One answer, one might imagine, is that not all groups are simple. That is not a completely satisfactory answer, because groups can be put together from simple groups in such a way that for many problems it is enough to solve them just for simple groups (just as in number theory one can often prove a result for primes and prove that the product of two numbers that satisfy the result also satisfies the result). But it is part of the answer. For example p-groups (that is, groups of prime power order) are built out of copies of a cyclic group of prime order, but that doesn’t begin to answer all the questions people have about p-groups.

Another answer, which is closer to the reason that 3-manifold theory survived Perelman, is that proving results even for specific families of groups is often far from easy. For example, have a go at proving that a random pair of (equivalence classes of) matrices generates PSL$(2,q)$ with high probability when q is large: it’s a genuine theorem rather than simply a verification.

I want to mention a very nice result that I think is due to Guralnick and his co-authors, though he didn’t quite explicitly say so. Let f\in\mathbb{F}_q[x] be a polynomial of degree n, with n coprime to q. Then for every a, either f is bijective on the field \mathbb{F}_{q^a} or the set of values it takes has size at most (5/6)q^a+O(q^{a/2}).

What’s so nice about that? Well, the result is interesting, but even more interesting (at least to me) is the fact that the proof involved the classification of finite simple groups, and Guralnick described it (or more accurately, a different result just before it but I think the same remark applies) as untouchable without CFSG, even though the statement is about polynomials rather than groups.

Here is the video of Guralnick’s lecture.

The third invited lecture I went to was given by Francis Brown. Although I was expecting to understand very little, I wanted to go to it out of curiosity, because I knew Francis Brown when he was an undergraduate at Trinity — I think I taught him once or twice. After leaving Trinity he went to France, where he had been ever since, until very recently taking up a (highly prestigious) professorial fellowship at All Soul’s in Oxford. It was natural for him to go to France, because his mother is French and he is bilingual — another aspect that interests me since two of my children are in the same position. I heard nothing of him for a long time, but then in the last few years he suddenly popped up again as the person who has proved some important results concerning motivic zeta functions.

The word “motivic” scares me, and I’m not going to try to say what it means, because I can’t. I first heard of motives about twenty years ago, when the message I got was that they were objects that people studied even though they didn’t know how to define them. That may be a caricature, but my best guess as to the correct story is that even though people don’t know the right definition, they do know what properties this definition should have. In other words, there is a highly desirable theory that would do lots of nice things, if only one could find the objects that got you started.

However, what Brown was doing appeared not to be based on layers of conjecture, so I suppose it must be that “motivic versions” of certain objects have been shown to exist.

This was a talk in which I did not take notes. To do a decent job describing it, I’d need to watch it again, but rather than do that, I’ll just describe the traces it left in my memory.

One was that he mentioned the famous problem of the irrationality of \zeta(n) for odd n\geq 5, and more generally the problem of whether the vector space over the rationals generated by \zeta(3),\zeta(5),\dots,\zeta(2n+1) has dimension n. (It has been shown by Ball and Rivoal to have dimension that tends to infinity with n, which was a major result when it was proved.)

Another was that he defined multiple zeta values, which are zeta-like functions of more than one integer variable, which come up naturally when one takes two zeta values, multiplies them together, and expands out the result. They were defined by Euler.

He also talked about periods, a very interesting concept defined (I think for the first time) by Kontsevich and Zagier. I highly recommend looking at their paper, available here in preprint form. At least the beginning of it is accessible to non-experts, and contains a very interesting open problem. Roughly speaking, a period is anything you can define using reasonably nice integrals. For example, \pi is a period because it is the area of the unit disc, which has a nice polynomial equation x^2+y^2\leq 1. The nice problem is to prove that an explicit number is not a period. There are only countably many periods, so such numbers exist in abundance. If you want a specific number to try, then you can begin with e. Best of luck.

While discussing what motivic zeta values are, he said that there were two approaches one could use, one involving Betti numbers and the other involving de Rham cohomology. He preferred the de Rham approach. “Betti” and “de Rham” became a sort of chorus throughout the talk, and even now I have ringing in my head phrases like “\lambda-Betti or \lambda-de Rham”.

If I understood correctly, linear dependences between motivic zeta values (which are much fancier objects that still depend on tuples of integers) imply the corresponding dependences between standard zeta values. (I’m talking about both single and multiple zeta values here.) That’s not much help if you are trying to prove independence of standard zeta values, but it does do two things for you. One is that it provides a closely related context in which the world seems to be a tidier place. As I understand it, all the conjectures one wants to be true for standard zeta values are true for their motivic cousins. But it has also enabled Brown to discover unexpected dependences between standard zeta values: for instance every multiple zeta value is a linear combination of multiple zeta values where every argument is 2 or 3. (I suppose multiple must mean “genuinely multiple” here.) Actually, looking very briefly at the relevant part of the talk, which is round about the 27th minute, I see that this was proving something called the Hoffman conjecture, so perhaps it is wrong to call it unexpected. But it is still a very interesting result, given that the proof was highly non-trivial and went via motivic zeta values.

My remaining memory trace is that the things Brown was talking about were related to a lot of other important parts of mathematics, and even theoretical physics. I’d love to understand this kind of thing better.

So although a lot of this talk (which is here) went over my head, enough of it didn’t that my attention was engaged throughout. Given the type of material, that was far from obviously going to be the case, so this was another very good talk, to round off a pretty amazing day.


Tommaso DorigoThe Quote Of The Week - Neutrino Mass Hierarchy and Matter Effects

Interaction with matter changes the neutrino mixing and effective mass splitting in a way that depends on the mass hierarchy. Consequently, results of oscillations and flavor conversion are different for the two hierarchies.
Sensitivity to the mass hierarchy appears whenever the matter effect on the 1-3 mixing and mass splitting becomes substantial. This happens in supernovae in large energy range, and in the matter of the Earth.
The Earth density profile is a multi-layer medium where the resonance enhancement of oscillations as well as the parametric enhancement of oscillations occur. The enhancement is realized in neutrino (antineutrino) channels for normal (inverted) mass hierarchy.

read more

BackreactionName that Þing

[Image credits Ria Novosti, source]
As teenager I switched between the fantasy and science fiction aisle of the local library, but in the end it was science fiction that won me over.

The main difference between the genres seemed the extent to which authors bothered to come up with explanations. The science fiction authors, they bent and broke the laws of Nature but did so consistently, or at least tried to. Fantasy writers on the other hand were just too lazy to work out the rules to begin with.

You could convert Harry Potter into a science fiction novel easily enough. Leaving aside gimmicks such as moving photos that are really yesterday’s future, call the floo network a transmitter, the truth serum a nanobot liquid, and the invisibility cloak a shield. Add some electric buzz, quantum vocabulary, and alien species to it. Make that wooden wand a light saber and that broom an X-wing starfighter, and the rest is a fairly standard story of the Other World, the Secret Clan, and the Chosen One learning the rules of the game and the laws of the trade, of good and evil, of friendship and love.

The one thing that most of the fantasy literature has which science fiction doesn’t have, and which has always fascinated me, is the idea of an Old Language, the idea that there is a true name for every thing and every place, and if you know the true name you have power over it. Speaking in the Old Language always tells the truth. If you speak the Old Language, you make it real.

This idea of the Old Language almost certainly goes back to our ancestor’s fights with an often hostile and unpredictable nature threatening their survival. The names, the stories, the gods and godzillas, they were their way of understanding and managing the environment. They were also the precursor to what would become science. And don’t we in physics today still try to find the true name of some thing so we have power over it?

Aren’t we still looking for the right words and the right language? Aren’t we still looking for the names to speak truth to power, to command that what threatens us and frightens us, to understand where we belong, where we came from, and where we go to? We call it dark energy and we call it dark matter, but these are not their true names. We call them waves and we call them particles, but these are not their true names. Some call the thing a string, some call it a graph, some call it a bit, but as Lee Smolin put it so nicely, none of these words quite has a “ring of truth” to it. These are not the real names.

Neil Gaiman’s recent fantasy novel “The Ocean at the End of the Road” also draws on the idea of an Old Language, of a truth below the surface, a theory of everything which the average human cannot fathom because they do not speak the right words. In Michael Ende’s “Neverending Story” that what does not have a true name dies and decays to nothing. (And of course Ende has a Chosen One saving the world from that no-thing.) It all starts and it all ends with our ability to name that what we are part of.

You don’t get a universe from nothing of course. You can get a universe from math, but the mathematical universe doesn’t come from nothing either, it comes from Max Tegmark, that is to say some human (for all I can tell) trying to find the right words to describe, well, everything - no point trying to be modest about it. Tegmark, incidentally, also seems to speak at least ten different languages or so, maybe that’s not a coincidence.

The evolution of language has long fascinated historians and neurologists alike. Language is more than assigning a sound to things and things you do with things. Language is a way to organize thought patterns and to classify relations, if in a way that is frequently inconsistent and often confusing. But the oldest language of all is neither Sindarin nor Old Norse, it is, for all we can tell, the language of math in which the universe was written. You can call it temperature anisostropy, or tropospheric ozone precursors, you can call it neurofibrillary tangle or reverse transcriptase, you can call them Bárðarbunga or Eyjafjallajökull - in the end their true names were written in math.

August 25, 2014

David Hoggdebugging

After a few days of vacation, I got back to working on data-driven stellar models with Ness. We spent some time today debugging issues with the tagging of new objects; behavior is mysterious.

Terence Tao(Matilde Lalin) Attending conferences with small children

[This guest post is authored by Matilde Lalin, an Associate Professor in the Département de mathématiques et de statistique at the Université de Montréal.  I have lightly edited the text, mostly by adding some HTML formatting. -T.]

Mathematicians (and likely other academics!) with small children face some unique challenges when traveling to conferences and workshops. The goal of this post is to reflect on these, and to start a constructive discussion what institutions and event organizers could do to improve the experiences of such participants.

The first necessary step is to recognize that different families have different needs. While it is hard to completely address everybody’s needs, there are some general measures that have a good chance to help most of the people traveling with young children. In this post, I will mostly focus on nursing mothers with infants (\leq 24 months old) because that is my personal experience. Many of the suggestions will apply to other cases such as non-nursing babies, children of single parents, children of couples of mathematicians who are interested in attending the same conference, etc..

The mother of a nursing infant that wishes to attend a conference has three options:

  1. Bring the infant and a relative/friend to help caring for the infant. The main challenge in this case is to fund the trip expenses of the relative. This involves trip costs, lodging, and food. The family may need a hotel room with some special amenities such as crib, fridge, microwave, etc. Location is also important, with easy access to facilities such as a grocery store, pharmacy, etc. The mother will need to take regular breaks from the conference in order to nurse the baby (this could be as often as every three hours or so). Depending on personal preferences, she may need to nurse privately. It is convenient, thus, to make a private room available, located as close to the conference venue as possible. The relative may need to have a place to stay with the baby near the conference such as a playground or a room with toys, particularly if the hotel room is far.
  2. Bring the infant and hire someone local (a nanny) to help caring for the infant. The main challenges in this case are two: finding the caregiver and paying for such services. Finding a caregiver in a place where one does not live is hard, as it is difficult to conduct interviews or get references. There are agencies that can do this for a (quite expensive) fee: they will find a professional caregiver with background checks, CPR certification, many references, etc. It may be worth it, though, as professional caregivers tend to provide high-quality services and peace of mind is priceless for the mother mathematician attending a conference. As in the previous case, the mother may have particular needs regarding the hotel room, location, and all the other facilities mentioned for Option 1.
  3. Travel without the infant and pump milk regularly. This can be very challenging for the mother, the baby, and the person that stays behind taking care of the baby, but the costs of this arrangement are much lower than in Option 1 or 2 (I am ignoring the possibility that the family needs to hire help at home, which is necessary in some cases). A nursing mother away from her baby has no option but to pump her milk to prevent her from pain and serious health complications. This mother may have to pump milk very often. Pumping is less efficient than nursing, so she will be gone for longer in each break or she will have more breaks compared to a mother that travels with her baby. For pumping, people need a room which should ideally be private, with a sink, and located as close to the conference venue as possible. It is often impossible for these three conditions to be met at the same time, so different mothers give priority to different features. Some people pump milk in washrooms, to have easy access to water. Other people might prefer to pump in a more comfortable setting, such as an office, and go to the washroom to wash the breast pump accessories after. If the mother expects that the baby will drink breastmilk while she is away, then she will also have to pump milk in advance of her trip. This requires some careful planning.
     
    Many pumping mothers try to store the pumped milk and bring it back home. In this case the mother needs a hotel room with a fridge which (ideally, but hard to find) has a freezer. In a perfect world there would also be a fridge in the place where she pumps/where the conference is held.

It is important to keep in mind that each option has its own set of challenges (even when expenses and facilities are all covered) and that different families may be restricted in their choice of options for a variety of reasons. It is therefore important that all these three options be facilitated.

As for the effect these choices have on the conference experience for the mother, Option 1 means that she has to balance her time between the conference and spending time with her relative/friend. This pressure disappears when we consider Option 2, so this option may lead to more participation in the conferences activities. In Option 3, the mother is in principle free to participate in all the conference activities, but the frequent breaks may limit the type of activity. A mother may choose different options depending on the nature of the conference.

I want to stress, for the three options, that having to make choices about what to miss in the conference is very hard. While talks are important, so are the opportunities to meet people and discuss mathematics that happen during breaks and social events. It is very difficult to balance all of this. This is particularly difficult for the pumping mother in Option 3: because she travels without her baby, she is not perceived to be a in special situation or in need of accommodation. However, this mother is probably choosing between going to the last lecture in the morning or having lunch alone, because if she goes to pump right after the last lecture, by the time she is back, everybody has left for lunch.

Here is the Hall of Fame for those organizations that are already supporting nursing mothers’ travels in mathematics:

  • The Natural Sciences and Engineering Research Council of Canada (NSERC) (search for “child care”) allows to reimburse the costs of child care with Option 2 out of the mother’s grants. They will also reimburse the travel expenses of a relative with Option 1 up to the amount that would cost to hire a local caregiver.
  • The ENFANT/ELEFANT conference (co-organized by Lillian Pierce and Damaris Schindler) provided a good model to follow regarding accommodation for parents with children during conferences that included funding for covering the travel costs of accompanying caretakers (the funding was provided by the Deutsche Forschungsgemeinschaft, and lactation rooms and play rooms near the conference venue (the facilities were provided by the Hausdorff Center for Mathematics).
     
    Additional information (where to go with kids, etc) was provided on site by the organizers and was made available to all participants all the time, by means of a display board that was left standing during the whole week of the conference.
  • The American Institute of Mathematics (AIM) reimburses up to 500 dollars on childcare for visitors and they have some online resources that assist in finding childcare and nannies.

[UPDATED] Added a few more things to the Hall of Fame

In closing, here is a (possibly incomplete) list of resources that institutes, funding agencies, and conferences could consider providing for nursing mother mathematicians:

    1. Funding (for cost associated to child care either professional or by an accompanying relative).
    2. List of childcare resources (nannies, nanny agencies, drop-in childcare centre, etc).
    3. Nursing rooms and playrooms near the conference venue. Nearby fridge.
    4. Breaks of at least 20 minutes every 2-3 hours.
    5. Information about transportation with infants. More specific, taxi and/or shuttle companies that provide infant car seats. Information regarding the law on infant seats in taxis and other public transportation.
    6. Accessibility for strollers.
    7. [UPDATED] A nearby playground location. (comment from Peter).

I also find it important that these resources be listed publicly in the institute/conference website. This serves a double purpose: first, it helps those in need of the resources to access them easily, and second, it contributes to make these accommodations normal, setting a good model for future events, and inspiring organizers of future events.

Finally, I am pretty sure that the options and solutions I described do not cover all cases. I would like to finish this note by inviting readers to make suggestions, share experiences, and/or pose questions about this topic.


Filed under: guest blog, non-technical, Uncategorized Tagged: Matilde Lalin

Tim GowersICM2014 — Emmanuel Candès plenary lecture

If you are a mathematical researcher, do you ever stop to ask yourself what the point is of all your research? Do you worry that the world could get along just fine without it?

One person who doesn’t lose any sleep over doubts like this is Emmanuel Candès, who gave the second plenary lecture I went to. He began by talking a little about the motivation for the kinds of problems he was going to discuss, which one could summarize as follows: his research is worthwhile because it helps save the lives of children. More precisely, it used to be the case that if a child had an illness that was sufficiently serious to warrant an MRI scan, then doctors faced the following dilemma. In order for the image to be useful, the child would have to keep completely still for two minutes. The only way to achieve that was to stop the child’s breathing for those two minutes. But depriving a child’s brain (or indeed any brain, I’d imagine) of oxygen for two minutes is not without risk, to put it mildly.

Now, thanks to the famous work of Candès and others on compressed sensing, one can reconstruct the image using many fewer samples, which reduces the time the child must keep still to 15 seconds. Depriving the brain of oxygen for 15 seconds is not risky at all. Candès told us about a specific boy who had something seriously wrong with his liver (I’ve forgotten the details) who benefited from this. If you want a ready answer for when people ask you about the point of doing maths, and if you’re sick of the Hardy-said-number-theory-useless-ha-ha-but-what-about-public-key-cryptography-internet-security-blah-blah example, then I recommend watching at least some of Candès’s lecture, which is available here, and using that instead. Then you’ll really have seized the moral high ground.

Actually, I recommend watching it anyway, because it was a fascinating lecture from start to finish. In that case, you may like to regard this post as something like a film review with spoilers: if you mind spoilers, then you’d better stop reading here.

I have to admit that as I started this post, I realized that there was something fairly crucial that I didn’t understand, that meant I couldn’t give a satisfactory account of what I wanted to describe. I didn’t take many notes during the talk, because I just wanted to sit back and enjoy it, and it felt as though I would remember everything easily, but there was one important mathematical point that I missed. I’ll come back to it in a moment.

Anyhow, the basic mathematical problem that the MRI scan leads to is this. A full scan basically presents you with the Fourier transform of the image you want, so to reconstruct the image you simply invert the Fourier transform. But if you are sampling in only two percent of directions and you take an inverse Fourier transform (it’s easy to make sense of that, but I won’t bother here), then you get a distorted image with all sorts of strange lines all over it — Candès showed us a picture — and it is useless for diagnostic purposes.

So, in a moment that Candès described as one of the luckiest in his life, a radiologist approached him and asked if there was any way of getting the right image from the much smaller set of samples. On the face of it, the answer might seem to be no, since the dimension of the space of possible outputs has become much smaller, so there must be many distinctions between inputs that are not detectable any more. However, in practice the answer is yes, for reasons that I’ll discuss after I’ve mentioned Candès’s second example.

The second example was related to things like the Netflix challenge, which was to find a good way of predicting which films somebody would like, given the preferences of other people and at least some of the preferences of the person in question. If we make the reasonable hypothesis that people’s preferences depend by and large on a fairly small number of variables (describing properties of the people and properties of the films), then we might expect that a matrix where the ijth entry represents the strength of preference of person i for film j would have fairly small rank. Or more reasonably, one might expect it to be a small perturbation of a matrix with small rank.

And thus we arrive at the following problem: you are given a few scattered entries of a matrix, and you want to find a low-rank matrix that agrees pretty well with the entries you observe. Also, you want it the low-rank matrix to be unique (up to a small perturbation) since otherwise you can’t use it for prediction.

As Candès pointed out, simple examples show that the uniqueness condition cannot always be obtained. For example, suppose you have 99 people with very similar preferences and one person whose preferences are completely different. Then the underlying matrix that describes their preferences has rank 2 — basically, one row for describing the preferences of the 99 and one for describing the preference of the one eccentric outlier. If all you have is a few entries for the outlier’s preferences, then there is nothing you can do to guess anything else about those preferences.

However, there is a natural assumption you can make, which I’ve now forgotten, that rules out this kind of example, and if a matrix satisfies this assumption then it can be reconstructed exactly.

Writing this, I realize that Candès was actually discussing a slight idealization of the problem I’ve described, in that he didn’t have perturbations. In other words, the problem was to reconstruct a low-rank matrix exactly from a few entries. An obvious necessary condition is that the number of samples should exceed the number of degrees of freedom of the set of low-rank matrices. But there are other conditions such as the one I’ve mentioned, and also things like that every row and every column should have a few samples. But given those conditions (or perhaps the sampling is done at random — I can’t remember) it turns out to be possible to reconstruct the matrix exactly.

The MRI problem boils down to something like this. You have a set of linear equations to solve (because you want to invert a Fourier transform) but the number of unknowns is significantly larger than the number of equations (because you have a sparse set of samples of the Fourier transform you want to invert). This is an impossible problem unless you make some assumption about the solution, and the assumption Candès makes is that it should be a sparse vector, meaning that it has only a few non-zero entries. This reduces the number of degrees of freedom considerably, but the resulting problem is no longer pure linear algebra.

The point that I missed was what sparse vectors have to do with MRI scans, since the image you want to reconstruct doesn’t appear to be a sparse vector. But looking back at the video I see that Candès addressed this point as follows: although the image is not sparse, the gradient of the image is sparse. Roughly speaking, you get quite a lot of patches of fairly constant colour, and if you assume that that is the case, then the number of degrees of freedom in the solution goes right down and you have a chance of reconstructing the image.

Going back to the more general problem, there is another condition that is needed in order to make it soluble, which is that the matrix of equations should not have too many sparse rows, since typically a sparse row acting on a sparse vector will give you zero, which doesn’t help you to work out what the sparse vector was.

I don’t want to say too much more, but there was one point that particularly appealed to me. If you try to solve these problems in the obvious way, then you might try to find algorithms for solving the following problems.

1. Given a system of underdetermined linear equations, find the sparsest solution.

2. Given a set of entries of a matrix, find the lowest rank matrix consistent with those entries.

Unfortunately, no efficient algorithms are known for these problems, and I think in the second case it’s even NP complete. However, what Candès and his collaborators did was consider convex relaxations of these problems.

1. Given a system of underdetermined linear equations, find the solution with smallest \ell_1 norm.

2. Given a set of entries of a matrix, find the matrix with smallest nuclear norm consistent with those entries.

If you don’t know what the nuclear norm is, it’s simple to define. Whereas the rank of a matrix A is the smallest number of rank-1 matrices such that A is a linear combination of those matrices, the nuclear norm of A is the minimum \sum_i|\lambda_i| such that you can write A=\sum_i\lambda_iB_i with each B_i a rank-1 matrix of norm 1. So it’s more like a quantitative notion of rank.

It’s a standard fact that convex relaxations of problems tend to be much easier than the problems themselves. But usually that comes at a significant cost: the solutions you get out are not solutions of the form you originally wanted, but more like convex combinations of such solutions. (For example, if you relax the graph-colouring problem, you can solve the relaxation but you get something called a fractional colouring of your graph, where the total amount of each colour at two adjacent vertices is at most 1, and that can’t easily be converted into a genuine colouring.)

However, in the cases that Candès was telling us about, it turns out that if you solve the convex relaxations, you get exactly correct solutions to the original problems. So you have the following very nice situation: a problem is NP-complete, but if you nevertheless go ahead and try to solve it using an algorithm that is doomed to fail in general, the algorithm still works in a wide range of interesting cases.

At first this seems miraculous, but Candès spent the rest of the talk explaining to us why it isn’t. It boiled down to a very geometrical picture: you have a convex body and a plane through one of its extreme points, and if the plane is tangent to the body then the algorithm will work. It is this geometrical condition that underlies the necessary conditions I mentioned earlier.

For me this lecture was one of the highlights of the ICM, and I met many other people who greatly enjoyed it too.


Clifford JohnsonCoral Forest

crochet_forest_7Given that you read here at this blog, you may well like to keep your boundaries between art and science nicely blurred, in which case you might like to learn more about the coral reef forests made of crochet spearheaded by Margaret and Christine Wertheim. The pieces mix crochet (a hand-craft I know and love well from my childhood - I got to explore my love for symmetry, patterns, and problem-solving by making doilies) with mathematics - hyperbolic geometry in particular - as well as biology (mimicking and celebrating the forms of corals - and drawing attention to their destruction in the wild). You can read much more about the projects here. I've mentioned the work here before on the blog, but the other day I went along to see a new set [...] Click to continue reading this post

John PreskillHow Physics for Poets can boost your Physics GRE score

As summer fades to autumn, prospective grad students are eyeing applications. Professors are suggesting courses, and seniors are preparing for Graduate Record Exams (GREs). American physics programs  (and a few programs abroad) require or encourage applicants to take the Physics GRE. If you’ve sighted physics grad school in your crosshairs, consider adding Physics for Poets (PFP) to your course list. Many colleges develop this light-on-the-math tour of physics history for non-majors. But Physics for Poets can boost your Physics GRE score while reinforcing the knowledge gained from your physics major.

My senior spring in college, PFP manifested as “PHYS 001/002: Understanding the Universe: From Atoms to the Big Bang.” The tallness of this order failed to daunt Marcelo Gleiser, a cosmologist whose lectures swayed with a rhythm like a Foucault pendulum. From creation myths and Greek philosophers, we proceeded via Copernicus and Kepler to Newton and the Enlightenment, Maxwell and electromagnetism, the Industrial Revolution and thermodynamics, Einstein’s relativity, and WWII and the first quantum revolution, ending with particle physics and cosmology. The course provided a history credit I needed. It offered a breather from problem sets, sandwiching my biophysics and quantum-computation lectures. Pragmatism aside, PHYS 2 showcased the grandness of the physicist’s legacy—of my legacy. PHYS 2 sharpened my determination to do that legacy justice. To do it justice, most of us must pass tests. Here’s how PFP can help.

http://ask.dartmouth.edu/categories/misc/55.html

(1) A Foucault pendulum, invented during the 19th century to demonstrate that the Earth rotates. (2) An excuse to review noninertial reference frames.

Reviewing basic physics can improve GRE scores. If thermodynamics has faded from memory, good luck calculating a Carnot engine’s efficiency. Several guides (and I) recommend reviewing notes and textbooks, working practice problems, simulating exams, and discussing material with peers.

Taking PFP, you will review for the GRE. A list of the topics on the Physics GRE appears here. Of the 14 mechanics topics, eight surfaced in PHYS 2. Taking PFP will force you to review GRE topics that lack of time (or that procrastination) might force you to skip. Earning credit for GRE prep, you won’t have to shoehorn that prep into your schedule at the expense of research. The Physics GRE covers basic physics developed during the past 500 years; so do many PFP courses.

The GRE, you might protest, involves more math than PFP. But GRE questions probe less deeply, and PFP can galvanize more reviews of math, than you might expect. According to Stanford’s Society of Physics Students, “Each [Physics GRE] question shouldn’t require much more than a minute’s worth of thought and computation.” Expect to use the Wave Equation and Maxwell’s Equations, not to derive the former from the latter. Some PFP instructors require students to memorize formulae needed on the GRE. PFP can verify whether your memory has interchanged the exponents in Kepler’s Third Law, or dropped the negative sign from the kinetic-energy term in Schrödinger’s Equation.

Even formulae excluded from PFP exams appear in PFP classes. In a PHYS 2 Powerpoint, Marcelo included Planck’s blackbody formula. Though he never asked me to regurgitate it, his review benefited me. Given such a Powerpoint, you can re-memorize the formula. Derive the Stefan-Boltzmann and Wien Displacement Laws. Do you remember how a blackbody’s energy density varies with frequency in the low-energy limit? PFP can catalyze your review of math used on the GRE.

xkcd - Science

Physics for Poets: It can work for applicants to physics grad programs.

While recapitulating basic physics, PFP can introduce “specialized” GRE topics. Examples include particle physics, astrophysics, and nuclear physics. Covered in advanced classes, these subjects might have evaded mention in your courses. Signing up for biophysics, I had to drop particle theory. PHYS 2 helped compensate for the drop. I learned enough about neutrinos and quarks to answer GRE questions about them. In addition to improving your score, surveying advanced topics in PFP can enhance your understanding of physics seminars and conversations. The tour can help you identify which research you should undertake. If you’ve tried condensed-matter and atmospheric research without finding your niche, tasting cosmology in PFP might point toward your next project. Sampling advanced topics in PFP, you can not only prepare for the GRE, but also enrich your research.

I am not encouraging you to replace advanced physics courses with PFP. I encourage you to complement advanced courses with PFP. If particle physics suits your schedule and intrigues you, enjoy. If you need to fulfill a history or social-sciences distribution requirement, check whether PFP can count. Consider PFP if you’ve committed to a thesis and three problem-set courses, you haven’t signed up for the minimum number of courses required by your college, and more problem sets would strangle you. Sleep deprivation improves neither exam scores nor understanding. Not that I sailed through PHYS 2 without working. I worked my rear off—fortunately for my science. Switching mindsets—pausing frustrating calculations to study Kepler—can refresh us. Stretch your calculational toolkit in advanced courses, and reinforce that toolkit with PFP.

In addition to reviewing basic physics and surveying specialized topics, you can seek study help from experts in PFP. When your questions about GRE topics overlap with PFP material, ask your instructor and TA. They’ll probably enjoy answering: Imagine teaching physics with little math, with one hand tied behind your back. Some students take your class not because they want to, but because they need science credits. Wouldn’t you enjoy directing a student who cares? While seeking answers, you can get to know your professor or TA. You can learn about his or her research. Maybe PFP will lead you to join that research. PFP not only may connect you to experts able to answer questions as no study guide can. PFP offers opportunities to enhance a course as few non-physics students can and to develop relationships with role models.

Science class

Instructors’ expertise has benefited science students throughout much of history.

Those relationships illustrate the benefits that PFP extends beyond GREs. As mentioned earlier, surveying advanced topics can diversify the research conversations you understand. The survey can direct you toward research you’ll pursue. Further exposure to research can follow from discussions with instructors. Intermissions from problem sets can promote efficiency. Other benefits of PFP include enhancement of explanatory skills and a bird’s-eye view of the scientific process to which you’re pledging several years. What a privilege we enjoy, PFP shows. We physicists explore questions asked for millennia. We wear mantles donned by Faraday, Bernoulli, and Pauli. When integrals pour out our ears and experiments break down, PFP can remind us why we bother. And that we’re in fine company.

Physics for Poets can improve your Physics GRE score and reinforce your physics major. Surveying basic physics, PFP will force you to review GRE topics. PFP may introduce specialized GRE topics absent from most physics majors. Opportunities abound to re-memorize equations and to complement lectures with math. Questioning instructors, you can deepen your understanding as with no study guide. Beyond boosting your GRE score, PFP can broaden your research repertoire, energize your calculations, improve your explanatory skills, and inspire.

Good luck with the academic year, and see you in grad school!


n-Category Café Math and Mass Surveillance: A Roundup

The Notices of the AMS has just published the second in its series “Mathematicians discuss the Snowden revelations”. (The first was here.) The introduction to the second article cites this blog for “a discussion of these issues”, but I realized that the relevant posts might be hard for visitors to find, scattered as they are over the last eight months.

So here, especially for Notices readers, is a roundup of all the posts and discussions we’ve had on the subject. In reverse chronological order (and updated after the original appearance of this post):

August 24, 2014

Chad OrzelPubs in London

While I kill time waiting for it to be a reasonable time to call Kate and the kids back in the US, a list of most of the pubs I’ve visited during this trip to London. Because why not?

In more or less chronological order:

The Victoria in Lancaster Gate. Or maybe Paddington, going from the URL. The naming of London neighborhoods is an enduring mystery to me. Anyway: given the name and location (two blocks from Hyde Park), I expected to be hip-deep in American tourists, but it was mostly a local crowd. Lots of Queen Victoria pictures, which mostly made me think of Eddie Izzard calling her “One of our frumpier Queens…” Good pie.

The Grapes in Limehouse. Tiny little pub with a microscopic back deck overlooking the Thames. Apparently mentioned in a Dickens novel. Co-owned by Sir Ian McKellan, but the only indicator of this is a Gandalf staff bolted to the wall next to the till. Ducked out of Worldcon for a bit to go down here with another guy, and we were cheerfully abused by the barmaid, he for ordering a half pint (“I want to try several beers.” “You try them in pints!”), me for having an American credit card that required a signature (“No, this doesn’t look like the same signature. I’m going to need to see your passport…”). (This is why I didn’t take a photo of the Gandalf staff, for the record…) Pub-wise, probably the highlight of the trip.

The Cheshire Cheese, in the Blackfriars sort of area. There’s some sort of construction going on outside, which is probably why I was one of three people in the entire place on a Saturday at lunchtime. The special of the day was lasagna, which in an only-the-British sort of touch came with a side of chips. I kept picturing Tony Shaloub’s character in Big Night asking if his crass American customers would like a side of mashed potatoes to go with the spaghetti they demanded with their risotto. Very pleasant, nice time chatting with the other patrons (who were on some sort of pub crawl vacation from the north of England).

The Mitre in Lancaster Gate. Three-level pub with each floor its own thing– downstairs is a cocktail bar, upstairs a fancy dining room. The pie here was an actual pie with crust all the way around, not a bowl of stew with a croissant on top, so they get full marks for that.

The Wells in Hampstead. Spent a bunch of time tromping around Hampstead Heath, because I didn’t feel like dealing with crowds and queues. Thought about going to the Spaniards, but that was a wee bit pricey, so I went here instead. Which was also pricey, being a “gastro-pub,” which means serving pub chow with fancier ingredients. Had the roast beef and Yorkshire pudding, because it seemed like a Sunday afternoon thing to do. Excellent food.

The Minories needed to touch base with Kate back in the US about a souvenir purchase at the Tower of London, so grabbed a pint in here while waiting on her reply. Surprisingly large place, probably because it’s located under a railroad bridge and nobody else wanted to be there. Watched a bit of an uninspiring soccer game between, um, Stoke City and Hull? Something like that.

The Monkey Puzzle. Features a wide range of mustelid-themed beers; thought about ordering a “Fursty Ferret” just to see what that could possibly be, but decided that one more might’ve made the walk back to the hotel a challenge, let alone packing for Stockholm. Relatively modern, so not a huge amount of atmosphere, but good beers and good food.

If that sounds like I’ve been drinking a lot of beer the last few days, well, I have. But then, my next stop is in Sweden, where I’ve been warned I might need to take out a bank loan to buy booze, so I figured I ought to take advantage of the plentiful pubs in London while I still have a credit rating…

(We also hit the Porterhouse Central in Dublin, which is more of a modern brewpub sort of thing, and a fairly undistinguished place near Christ Church whose name escapes me. Pubs aren’t really Kate’s kind of thing, though, what with the whole not drinking beer thing…)

Steinn SigurðssonX-Volcanoes – Bárðarbunga

Bárðarbunga is arguably the scariest of the 30 or so active volcanoes in Iceland.

Extreme volcanoes don’t always have extreme eruptions, but they are scary because they have the capability for extreme events, uniquely so.

Bárðarbunga

Bárðarbunga – under the ice cap at the top left – from Google maps

It is not the most active, it is not the tallest, it may possibly be the biggest in some sense, but it is the volcano which gave us the largest eruption on Earth since modern humans started trying to get organized: the Þjórsárhraun eruption about 8,500 years ago.

Þjórsárhraun - upper part

Þjórsárhraun – upper part, the lava flows all the way to the sea to the south west. The dark lava in the lower right hand corner towards the lake is the path (it is actually younger lava on top of the old flow, from same general system of volcanoes).
- from Google maps

That eruption pumped out about 30 km3 of lava.

Bárðarbunga has erupted a number of times in historical times, and, given very limited data, seems to undergo a sequence of eruptions roughly every few centuries.
It is not a simple volcano, but rather a complex around a central crater on the north atlantic ridge as it runs across Iceland.

The central crater is a 2,000 m peak, the second tallest on the island.
It has a ~ 50-100 km2 oval caldera, which is about 700m deep and covered with an ice cap.
The caldera seems to have last erupted in 18th century, it is hard to tell, since large volcanoes can have small eruptions, and if the lava does not breach the ice, it would not necessarily have been clear which volcano was erupting, or even that there was an eruption, before modern seismographs etc.

In addition to the main caldera, there is a system of fissures to the northeast, along Dyngjuháls, where there have been relative frequent, relatively modest fissure eruptions, and to the southwest, towards Veiðivötn, where the very largest historical eruptions have taken place, most recently in 870 and 1480 AD.
The latter are the bad, “everybody dies” kinda eruptions…

Starting a week ago a lot of small earthquakes started popping off under the ice cap at Bárðarbunga – about one per minute, still going, mostly magnitude 1-2, with occasional 3-4 shakers. These are mostly moderately deep earthquakes, 5-10 km below ground, and well localized, if you look at the maps in the links below you see they are in two groups and move in time.
One set of earthquakes is under the main caldera and seems to mover around the center of the caldera, the other is drifting northeast and up to the surface, forming about a 25 km line towards the edge of the glacier.
The earthquakes are accompanied by moderately strong harmonic tremors, roughly 1 Hz vibrations. There was also a small rise in conductivity in Jökulsá á Fjöllum, the main river running from the glacier to the north.
The volcano is also bulging, the whole area, maybe 100 km across, has risen moved [strictly speaking the area didn't so much rise as separate - it moved laterally - net movement is now more like 20 cm!] about 5 cm over the week, as measured by GPS sensors.

What is happening is an injection of magma, molten rock, from below the volcano into the chamber under it. There is not enough room for it, so it is cracking the rock and forcing it open. The caldera is blocked by a “cork” of solidified lava, and the weight of ice, some magma is trying to force its way up around the edge of the caldera other magma has flowed off through fissures in the rock to the north east forming a 25 km subsurface dyke.

The lava from this system is generally fairly fluid as icelandic lava goes and very gas rich, and usually particularly loaded with nasty toxic gases, including, notoriously, hydrogen fluoride.

The system is under metered – it has not been studied long, is hard to get to and is big. The amount of magma in this injection is 1-200 million cubic meters, or 0.1-0.2 km3. Not a lot on historical scales, but a lot compared to some recent eruptions in other volcanoes, and the total magma in the chamber is easily 10-100 time more than that.

Now what?

Well, it could just stop.
The new lava cools, and we wait for the next injection.

Or, it could erupt.
It could be a small eruption, even so small it does not even breach the ice, or it could be big.

An eruption would probably be immediately preceded by several magnitude 4-5 earthquakes as the lava cracked the final few hundred meters of crust and forced its way to the surface.

A reduction in the number of small earthquakes may be worrying if there is an increase in the number of bigger earthquakes and a change in the pattern of location of the quakes.
There’ll be a lot of geologists peering at the 3D rotations of time slices of those earthquakes (see below) trying to figure out where this thing is going.

It could be an explosive eruption, from a caldera, with ice vapourizing a lot of lava and fine ash flung high into the stratosphere, or it could be a long fountaining fissure eruption.
Or both.

It could be over in a few hours or days, or it could go on for months or years.

The dangers, in order:

1) a flash flood from the ice melting.
This is most likely, and most likely to go to the north.
Warning time is hours, it could be spectacular or it could be catastrophic and landscape transforming, depending on how much ice melts and how quickly.
I helped build a couple of the hydrology stations (V289 is go!) along Jökulsá á Fjöllum – those would be toast in a flood… :-(

If the flood goes south and west, the hydroelectric power stations and dams are at risk as well as farms and roads on the more densely populated south coast.
Bárðarbunga can go either way, though the trend is looking north (except for a recent small turn to the west in the caldera tonight).

2) explosive eruption with ash spreading high and far.
This will happen if lava breaches the caldera or a new caldera forms along the dyke under the ice and a lot of lava contacts the ice above rapidly.
It could be as bad, or worse, than Eyjafjallajökull, though aviation authorities claim to understand the ash transport and risks better now and would not restrict flights as broadly as back then.
Ash can lead to temporary cooling and crop damage, and can be transported on continental scales.
Current winds would carry ash south and west east mostly, over the British Isles, France, Germany and either into eastern Europe or up into Scandinavia, though that can change.

3) prolonged fissure eruption – the main danger there is both extensive lava flow, but more seriously, very large volumes of nasty gases.
These can lead to multi-year temporary cooling on continental scales, with associated disruption of weather patterns, direct damage to crops and even acute toxicity in extreme events.

It is mildly reassuring that the tremors in this episode have been moving north east, suggesting any fissure eruption might be in the generally less extreme group of cracks on that side. It is hard to know though, since anecdotal accounts of historic eruptions suggest small eruptions in one direction can suddenly become large eruptions in different parts of the system with some lag time – we definitely see triggering in modern eruptions as earthquakes and pressure changes let other parts of adjacent systems release.

The bigger more extreme events are less likely in any given event, but one reason Bárðarbunga is one of the volcanoes Iceland keeps a very close eye on is that it is one of the few that is big enough to be extreme.

Interesting times.
Lets hope it fizzles, or just blows off a bit of steam with a nice little “tourist eruption” with some photogenic plumes and fountains of fire.

live update of earthquakes in the last 24 hours in Iceland

Bárðarbunga – eldgos.is

Bárðarbungu eldgos

Who was Bárður?

Harmonic tremors – click on the little red dots…

Vulkan blog – Haraldur Sigurðsson

Iceland Volcano blog – Dave McGarvie @subglacial

Icelandic volcano shakes ominously – Nature “news explainer” by Alexandra Witze

Iceland geology – jónfr – intense amateur volcanology blog

3D plot of earthquakes under Bárðarbunga – play with time slices and spatial rotation for perspective

The sounds of earthquakes…

webcam – Kverkfjöll

webcam – Grímsfjall

Steinn Sigurðssonsay it like it is

good thing about Icelandic, it is phonetic, almost all the words are pronounced the way they are spelled, including Bárðarbunga

the extra few letters are just what they look like.
Fortunately Biggi Lögga is there to set you straight.

Mark Chu-CarrollGödel part 3: Meta-Logic with Arithmetic

On to the next part of Gödel’s proof of incompleteness. To refresh your memory, here’s a sketch of the proof:

  1. Take a simple logic. We’ve been using a variant of the Principia Mathematica’s logic, because that’s what Gödel used.
  2. Show that any statement in the logic can be encoded as a number using an arithmetic process based on the syntax of the logic. The process of encoding statements numerically is called Gödel numbering.
  3. Show that you can express meta-mathematical properties of logical statements in terms of arithemetic properties of their Gödel numbers. In particular, we need to build up the logical infrastructure that we need to talk about whether or not a statement is provable.
  4. Using meta-mathematical properties, show how you can create an unprovable statement encoded as a Gödel number.

What we’ve done so far is the first two steps, and part of the third. In this post, we saw the form of the Principia’s logic that we’re using, and how to numerically encode it as a Gödel numbering. We’ve start started on the third point in this post, by figuring out just what it means to say that things are encoded arithmetically. Now we can get to the part where we see how to encode meta-mathematical properties in terms of arithmetic properties of the Gödel numbering. In this post, we’re going to build up everything we need to express syntactic correctness, logical validity, and provability in terms of arithmetical properties of Gödel numbers. (And, as a reminder, I’ve been using this translation on Gödel’s original paper on incompleteness.)

This is the most complex part of the incompleteness proof. The basic concept of what we’re doing is simple, but the mechanics are very difficult. What we want to do is define a set of predicates about logical statements, but we want those predicates to be expressed as arithmetic properties of the numerical representations of the logical statements.

The point of this is that we’re showing that done in the right way, arithmetic is logic – that doing arithmetic on the Gödel numbers is doing logical inference. So what we need to do is build up a toolkit that shows us how to understand and manipulate logic-as-numbers using arithmetic. As we saw in the last post, primitive recursion is equivalent to arithmetic – so if we can show how all of the properties/predicates that we define are primitive recursive, then they’re arithmetic.

This process involves a lot of steps, each of which is building the platform for the steps that follow it. I struggled quite a bit figuring out how to present these things in a comprehensible way. What I ended up with is writing them out as code in a pseudo-computer language. Before inventing this language, I tried writing actual executable code, first in Python and then in Haskell, but I wasn’t happy with the clarity of either one.

Doing it in an unimplemented language isn’t as big a problem as you might think. Even if this was all executable, you’re not going to be able to actually run any of it on anything real – at least not before you hair turns good and gray. The way that this stuff is put together is not what any sane person would call efficient. But the point isn’t to be efficient: it’s to show that this is possible. This code is really all about searching; if we wanted to be efficient, this could all be done in a different representation, with a different search method that was a lot faster – but that wolud be harder to understand.

So, in the end, I threw together a simple language that’s easy to read. This language, if it were implemented, wouldn’t really even be Turing complete – it’s a primitive recursive language.

Basics

We’ll start off with simple numeric properties that have no obvious connection to the kinds of meta-mathematical statements that we want to talk about, but we’ll use those to define progressively more and more complex and profound properties, until we finally get to our goal.

# divides n x == True if n divides x without remainder.
pred divides(n, x) = x mod n == 0

pred isPrime(0) = False
pred isPrime(1) = False
pred isPrime(2) = True
pred isPrime(n) = {
  all i in 2 to n {
    not divides(i, n)
  }
}

fun fact(0) = 1
fun fact(n) = n * fact(n - 1)

Almost everything we’re going to do here is built on a common idiom. For anything we want to do arithmetically, we’re going to find a bound – a maximum numeric value for it. Then we’re going to iterate over all of the values smaller than that bound, searching for our target.

For example, what’s the nth prime factor of x? Obviously, it’s got to be smaller than x, so we’ll use x as our bound. (A better bound would be the square root of x, but it doesn’t matter. We don’t care about efficiency!) To find the nth prime factor, we’ll iterate over all of the numbers smaller than our bound x, and search for the smallest number which is prime, which divides x, and which is larger than the n-1th prime factor of x. We’ll translate that into pseudo-code:

fun prFactor(0, x) = 0
fun prFactor(n, x) = {
  first y in 1 to x {
    isPrime(y) and divides(y, x) and prFactor(n - 1, x) < y
  }
}

Similarly, for extracting values from strings, we need to be able to ask, in general, what's the nth prime number? This is nearly identical to prFactor above. The only difference is that we need a different bound. Fortunately, we know that the nth prime number can't be larger than the factorial of the previous prime plus 1.

fun nthPrime(0) = 0
fun nthPrime(n) = {
  first y in 1 to fact(nthPrime(n - 1)) + 1  {
    isPrime(y) and y > nthPrime(n - 1))
  }
}

In composing strings of Gödel numbers, we use exponentiation. Given integers x and n, xn, we can obviously compute them via primitive recursion. I'll define them below, but in the rest of this post, I'll write them as an operator in the language:

fun pow(n, 0) = 1
fun pow(n, i) = n * pow(n, i - 1)

String Composition and Decomposition

With those preliminaries out of the way, we can get to the point of defining something that's actually about one of the strings encoded in these Gödel numbers. Given a number n encoding a string, item(n, x) is the value of the nth character of x. (This is slow. This is really slow! We're getting to the limit of what a very powerful computer can do in a reasonable amount of time. But this doesn't matter. The point isn't that this is a good way of doing these things: it's that these things are possible. To give you an idea of just how slow this is, I started off writing the stuff in this post in Haskell. Compiled with GHC, which is a very good compiler, using item to extract the 6th character of an 8 character string took around 10 minutes on a 2.4Ghz laptop. In the stuff that follows, we'll be using this to extract characters from strings that could be hundreds of characters long!)

fun item(n, x) = {
  first y in 1 to x {
    divides(prFactor(n, x) ** y, y) and
      not divides(prFactor(n, x)**(y+1), x)
  }
}

Given a string, we want to be able to ask how long it is; and given two strings, we want to be able to concatenate them.

fun length(x) = {
  first y in 1 to x {
    prFactor(y, x) > 0 and prFactor(y + 1, x) == 0
  }
}

fun concat(x, y) = {
  val lx = length(x)
  val ly = length(y)

  first z in 1 to nthprime(lx + ly)**(x + y) {
    (all n in 1 to lx {
        item(n, z) == item(n, x)
     }) and (all n in 1 to ly {
        item(n + lx, z) == item(n, y)
      })
  }
}

fun concatl([]) = 0
fun concatl(xs) = {
  concat(head(xs), concatl(tail(xs)))
}

fun seq(x) = 2**x

We want to be able to build statements represented as numbers from other statements represented as numbers. We'll define a set of functions that either compose new strings from other strings, and to check if a particular string is a particular kind of syntactic element.

# x is a variable of type n.
pred vtype(n, x) = {
  some z in 17 to x {
    isPrime(z) and x == n**z
  }
}

# x is a variable
pred isVar(x) = {
  some n in 1 to x {
    vtype(n, x)
  }
}

fun paren(x) =
  concatl([gseq(11), x, gseq(13)])

# given the Gödel number for a statement x, find
# the Gödel number for not x.
fun gnot(x) =
  concat(gseq(5), paren(x))

# Create the number for x or y.
fun gor(x, y) =
  concatl([paren(x), seq(7), paren(y)])

# Create the number for 'forall x(y)'.
fun gforall(x, y) =
  concatl([seq(9), seq(x), paren(y)])

# Create the number for x with n invocations of the primitive
# successor function.
fun succn(0, x) = x
fun succn(n, x) = concat(seq(3), succn(n - 1, x))

# Create the number n using successor and 0.
fun gnumber(n) = succn(n, seq(1))

# Check if a statement is type-1.
pred stype_one(x) = {
  some m in 1 to x {
     m == 1 or (vtype(1, m) and x == succn(n, seq(m))
  }
}

# Check if a statement is type n.
pred fstype(1, x) = stype_one(x)
pred fstype(n, x) =
  some v in 1 to x {
    vtype(n, v) and R(v)
  }
}

That last function contains an error: the translation of Gödel that I'm using says R(v) without defining R. Either I'm missing something, or the translator made an error.

Formulae

Using what we've defined so far, we're now ready to start defining formulae in the basic Principia logic. Forumlae are strings, but they're strings with a constrained syntax.

pred elFm(x) = {
  some y in 1 to x {
    some z in 1 to x {
      some n in 1 to x {
        stype(n, y) and stype(n+1, z) and x == concat(z, paren(y))
      }
    }
  }
}

All this is doing is expressing the grammar rule in arithmetic form: an elementary formula is a predicate: P(x), where x is a variable on level n, and P is a variable of level x + 1.

The next grammar rule that we encode this way says how we can combine elementary formulae using operators. There are three operators: negation, conjunction, and universal quantification.

pred op(x, y, z) = {
  x == gnot(y) or
  x == gor(y, z) or
  (some v in 1 to x { isVar(v) and x == gforall(v, y) })
}

And now we can start getting complex. We're going to define the idea of a valid sequence of formulae. x is a valid sequence of formulae when it's formed from a collection of formulae, each of which is either an elementary formula, or is produced from the formulae which occured before it in the sequence using either negation, logical-or, or universal quantification.

In terms of a more modern way of talking about it, the syntax of the logic is a grammar. A formula sequence, in this system, is another way of writing the parse-tree of a statement: the sequence is the parse-tree of the last statement in the sequence.

pred fmSeq(x) = {
  all p in 0 to length(x) {
    elFm(item(n, x)) or
      some p in 0 to (n - 1) {
        some q in 0 to (n - 1) {
          op(item(n,x), item(p, x), item(q, x))
        }
      }
  }
}

The next one bugs me, because it seems wrong, but it isn't really! It's a way of encoding the fact that a formula is the result of a well-defined sequence of formulae. In order to ensure that we're doing primitive recursive formulae, we're always thinking about sequences of formulae, where the later formulae are produced from the earlier ones. The goal of the sequence of formula is to produce the last formula in the sequence. What this predicate is really saying is that a formula is a valid formula if there is some sequence of formulae where this is the last one in the sequence.

Rephrasing that in grammatical terms, a string is a formula if there is valid parse tree for the grammar that produces the string.

pred isFm(x) = {
  some n in 1 to nthPrime(length(x)**2)**(x*length(x)**2) {
    fmSeq(n)
  }
}

So, now, can we say that a statement is valid because it's parsed according to the grammar? Not quite. It's actually a familiar problem for people who write compilers. When you parse a program in some language, the grammar doesn't usually specify variables must be declared before they're used. It's too hard to get that into the grammar. In this logic, we've got almost the same problem: the grammar hasn't restricted us to only use bound variables. So we need to have ways to check whether a variable is bound in a Gödel-encoded formula, and then use that to check the validity of the formula.

# The variable v is bound in formula x at position n.
pred bound(v, n, x) = {
  isVar(v) and isFm(x) and
  (some a in 1 to x {
    some b in 1 to x {
      some c in 1 to x {
        x == concatl([a, gforall(v, b), c]) and
        isFm(b) and
        length(a) + 1 ≤ n ≤ length(a) + length(forall(v, b))
      }
    }
  })
}

# The variable v in free in formula x at position n
pred free(v, n, x) = {
  isVar(v) and isFm(x) and
  (some a in 1 to x {
    some b in 1 to x {
      some c in 1 to x {
        v == item(n, x) and n ≤ length(x) and not bound(v, n, x)
      }
    }
  })
}

pred free(v, x) = {
  some n in 1 to length(x) {
    free(v, n, x)
  }
}

To do logical inference, we need to be able to do things like replace a variable with a specific infered value. We'll define how to do that:

# replace the item at position n in x with y.
fun insert(x, n, y) = {
  first z in 1 to nthPrime(length(x) + length(y))**(x+y) {
    some u in 1 to x {
      some v in 1 to x {
        x == concatl([u, seq(item(n, x)), v]) and
        z == concatl([u, y, v]) and
        n == length(u) + 1
      }
    }
  }
}

There are inference operations and validity checks that we can only do if we know whether a particular variable is free at a particular position.

# freePlace(k, v, k) is the k+1st place in x (counting from the end)
# where v is free.
fun freePlace(0, v, x) = {
  first n in 1 to length(x) {
    free(v, n, x) and
    not some p in n to length(x) {
      free(v, p, x)
    }
  }
}

fun freePlace(k, v, x) = {
  first n in 1 to freePlace(n, k - 1, v) {
    free(v, n, x) and
    not some p in n to freePlace(n, k - 1, v) {
      free(v, p, x)
    }
  }
}

# number of places where v is free in x
fun nFreePlaces(v, x) = {
  first n in 1 to length(x) {
    freeplace(n, v, x) == 0
  }
}

In the original logic, some inference rules are defined in terms of a primitive substitution operator, which we wrote as subst[v/c](a) to mean substitute the value c for the variable c in the statement a. We'll build that up on a couple of steps, using the freePlaces function that we just defined.

# Subst1 replaces a single instance of v with y.
fun subst'(0, x, v, y) = x
fun subst1(0k, x, v, y) =
  insert(subst1(k, x, v, y), freePlace(k, v, x), y)

# subst replaces all instances of v with y
fun subst(x, v, y) = subst'(nFreePlaces(v, x), x, v, y)

The next thing we're going to do isn't, strictly speaking, absolutely necessary. Some of the harder stuff we want to do will be easier to write using things like implication, which aren't built in primitive of the Principia logic. To write those as clearly as possible, we'll define the full suite of usual logical operators in terms of the primitives.

# implication
fun gimp(x, y) = gor(gnot(x), y)

# logical and
fun gand(x, y) = gnot(gor(gnot(x), gnot(y)))

# if/f
fun gequiv(x, y) = gand(gimp(x, y), gimp(y, x))

# existential quantification
fun gexists(v, y) = not(gforall(v, not(y)))

Axioms

The Peano axioms are valid logical statements, so they have Gödel numbers in this system. We could compute their value, but why bother? We know that they exist, so we'll just give them names, and define a predicate to check if a value matches them.

The form of the Peano axioms used in incompleteness are:

  1. Zero: ¬(succ(x1) = 0)
  2. Uniqueness: succ(x1) = succ(y1) Rightarrow x = y
  3. Induction: x2(0) ∧ ∀x1(x2(x1)⇒ x2(succ(x1))) ⇒ ∀x1(x2(x1))
const pa1 = ...
const pa2 = ...
const pa3 = ...

pred peanoAxiom(x) =
  (x == pa1) or (x == pa2) or (x == pa3)

Similarly, we know that the propositional axioms must have numbers. The propositional
axioms are:

  1. p \lor p \Rightarrow p
  2. p \Rightarrow p \lor q
  3. p \lor q \Rightarrow p \lor q
  4. (p \Rightarrow q) \Rightarrow (r \lor p \Rightarrow r \lor q)

I'll show the translation of the first - the rest follow the same pattern.

# Check if x is a statement that is a form of propositional
# axiom 1: y or y => y
pred prop1Axiom(x) =
  some y in 1 to x {
    isFm(x) and x == imp(or(y, y), y)
  }
}

pred prop2Axiom(x) = ...
pred prop3Axiom(x) = ...
pred prop4Axiom(x) = ...
pred propAxiom(x) = prop2Axiom(x) or prop2Axiom(x) or
    prop3Axiom(x) or prop4Axiom(x)

Similarly, all of the other axioms are written out in the same way, and we add a predicate isAxiom to check if something is an axiom. Next is quantifier axioms, which are complicated, so I'll only write out one of them - the other follows the same basic scheme.

The two quantifier axioms are:

  1. \forall v(a) \Rightarrow \text{subst}[v/c](a)
  2. \forall v(b \lor a) \Rightarrow (b \lor \forall v(a))
quantifier_axiom1_condition(z, y, v) = {
  not some n in 1 to length(y) {
    some m in 1 to length(z) {
      some w in 1 to z {
         w == item(m, z) and bound(w, n, y) and free(v, n, y)
      }
    }
  }
}

pred quantifier1Axiom(x) = {
  some v in 1 to x {
    some y in 1 to x {
      some z in 1 to x {
        some n in 1 to x {
          vtype(n, v) and stype(n, z) and
          isFm(y) and
          quantifier_axiom1_condition(z, y, v) and
          x = gimp(gforall(v, y), subst(y, v, z))
        }
      }
    }
  }
}

quanitifier_axiom2 = ...
isQuantifierAxiom = quantifier1Axiom(x) or quantifier2Axiom(x)

We need to define a predicate for the reducibility axiom (basically, the Principia's version of the ZFC axiom of comprehension). The reducibility axiom is a schema: for any predicate a, \exists u (\forall v (u(v) \Leftrightarrow a. In our primitive recursive system, we can check if something is an instance of the reducibility axiom schema with:

pred reduAxiom(x) =
  some u in 1 to x {
    some v in 1 to x {
      some y in 1 to x {
        some n in 1 to x {
          vtype(n, v) and
          vtype(n+1, u) and
          not free(u, y) and
          isFm(y) and
          x = gexists(u, gforall(v, gequiv(concat(seq(u), paren(seq(v))), y)))
        }
      }
    }
  }
}

Now, the set axiom. In the logic we're using, this is the axiom that defines set equality. It's written as \forall x_1 (x_2(x_1) \Leftrightarrow y_2(y_1) \Rightarrow x_2 = x_1). Set equality is defined for all types of sets, so we need to have one version of axiom for each level. We do that using type-lifting: we say that the axiom is true for type-1 sets, and that any type-lift of the level-1 set axiom is also a version of the set axiom.

fun typeLift(n, x) = {
  first y in 1 to x**(x**n) {
    all k in 1 to length(x) {
      item(k, x) ≤ 13 and item(k, y) == item(k, v) or
      item(k, x) > 13 and item(k, y) = item(k, x) * prFactor(1, item(k, x))**n
    }
  }
}

We haven't defined the type-1 set axiom. But we just saw the axiom above, and it's obviously a simple logical statement. That mean that it's got a Gödel number. Instead of computing it, we'll just say that that number is called sa1. Now we can define a predicate to check if something is a set axiom:

val sa1 = ...
pred setAxiom(x) =
  some n in 1 to x {
    x = typeLift(n, sa)
  }
}

We've now defined all of the axioms of the logic, so we can now create a general predicate to see if a statement fits into any of the axiom categories:

pred isAxiom(x) =
  peanoAxiom(x) or propAxiom(x) or quantifierAxom(x) or
  reduAxiom(x) or setAxiom(x)

Proofs and Provability!

With all of the axioms expressible in primitive recursive terms, we can start on what it means for something to be provable. First, we'll define what it means for some statement x to be an immediate consequence of some statements y and z. (Back when we talked about the Principia's logic, we said that x is an immediate consequence of y and z if either: y is the formula z ⇒ x, or if c is the formula ∀v.x).

pred immConseq(x, y, z) = {
  y = imp(z, x) or
  some v in 1 to x {
    isVar(v) and x = forall(v, y)
  }
}

Now, we can use our definition of an immediate consequence to specify when a sequence of formula is a proof figure. A proof figure is a sequence of statements where each statement in it is either an axiom, or an immediate consequence of two of the statements that preceeded it.

pred isProofFigure(x) = {
  (all n in 0 to length(x) {
    isAxiom(item(n, x)) or
    some p in 0 to n {
      some q in 0 to n {
        immConseq(item(n, x), item(p, x), item(q, x))
      }
    }
  }) and
  length(x) > 0
}

We can say that x is a proof of y if x is proof figure, and the last statement in x is y.

pred proofFor(x, y) =
  isProofFigure(x) and
  item(length(x), x) == y

Finally, we can get to the most important thing! We can define what it means for something to be provable! It's provable if there's a proof for it!

pre provable(x) =
  some y {
    proofFor(y, x)
  }
}

Note that this last one is not primitive recursive! There's no way that we can create a bound for this: a proof can be any length.

At last, we're done with these definition. What we've done here is really amazing: now, every logical statement can be encoded as a number. Every proof in the logic can be encoded as a sequence of numbers: if something is provable in the Principia logic, we can encode that proof as a string of numbers, and check the proof for correctness using nothing but (a whole heck of a lot of) arithmetic!

Next post, we'll finally get to the most important part of what Gödel did. We've been able to define what it means for a statement to be provable - we'll use that to show that there's a way of creating a number encoding the statement that something is not provable. And we'll show how that means that there is a true statement in the Principia's logic which isn't provable using the Principia's logic, which means that the logic isn't complete.

In fact, the proof that we'll do shows a bit more than that. It doesn't just show that the Principia's logic is incomplete. It shows that any consistent formal system like the Principia, any system which is powerful enough to encode Peano arithmetic, must be incomplete.

Mark Chu-CarrollGodel (Reposts)

I’m going to be on vacation this week, which means that I won’t have time to write new posts. But my friend Dr. SkySkull was just talking about Gödel on twitter, and chatting with him, I realized that this would be a good time to repost some stuff that I wrote about Gödel’s incompleteness proof.

Incompleteness is one of the most beautiful and profound proofs that I’ve ever seen. If you’re at all interested in mathematics, it’s something that’s worth taking the effort to understand. But it’s also pretty on-topic for what I’ve been writing about. The original incompleteness proof is written for a dialect of math based on ST type theory!

It takes a fair bit of effort to work through the incompleteness proof, so it’ll be a weeks worth of reposts. What I’m going to do is work with this translation of the original paper where Gödel published his first incompleteness proof. Before we can get to the actual proof, we need to learn a bit about the particular kind of logic that he used in his proof.

It goes right back to the roots of type theory. Set theory was on the rocks, due to Russell’s paradox. Russell’s paradox did was show that there was a foundational problem in math. You could develop what appeared to be a valid mathematicial structure and theory, only to later discover that all the work you did was garbage, because there was some non-obvious fundamental inconsistency in how you defined it. But the way that foundations were treated simple wasn’t strong or formal enough to be able to detect, right up front, whether you’d hard-wired an inconsistency into your theory. So foundations had to change, to prevent another incident like the disaster of early set theory.

In the middle of this, along came two mathematicians, Russell and Whitehead, who wanted to solve the problem once and for all. They created an amazing piece of work called the Principia Mathematica. The principia was supposed to be an ideal, perfect mathematical foundation. It was designed to have to key properties: it was supposed to consistent, and it was supposed to be complete.

  • Consistent meant that the statement would not allow any inconsistencies of any kind. If you used the logic and the foundantions of the Principia, you couldn’t even say anything like Russell’s paradox: you couldn’t even write it as a valid statement.
  • Complete meant that every true statement was provably true, every false statement was provably false, and every statement was either true or false.

A big part of how it did this was by creating a very strict stratification of logic – the stratification that we discussed in ST type theory. The principia’s type-theoretic logic could reason about a specific number, using level-0 statements. You could reason about sets of numbers using level-1 statements. And you could reason about sets of sets of numbers using level-2 statements. In a level-1 statement, you could make meta-statements about level-0 properties, but you couldn’t reason about level-1. In level-2, you could reason about level-1 and level-0, but not about level-2. This meant that you couldn’t make a set like the set of all sets that don’t contain themselves – because that set is formed using a predicate – and when you’re forming a set like “the set of all sets that don’t contain themselves”, the sets you can reason about are a level below you. You can form a level-1 set using a level-1 statement about level-0 objects. But you can’t make a level-1 statement about level-1 objects! So self-referential systems would be completely impossible in the Principia’s logic. You just can’t even state Russell’s paradox. It’s inexpressible.

As a modern student of math, it’s hard to understand what a profound thing they were trying to do. We’ve grown up learning math long after incompleteness became a well-known fact of life. (I read “Gödel Escher Bach” when I was a college freshman – well before I took any particularly deep math classes – so I knew about incompleteness before I knew enough to really understand what completeness woud have meant!) The principia would have been the perfection of math, a final ultimate perfect system. There would have been nothing that we couldn’t prove, nothing in math that we couldn’t know!

What Gödel did was show that using the Principia’s own system, and it’s own syntax, that not only was the principia itself flawed, but that any possible effort like the principia would inevitably be flawed!

With the incompleteness proof, Gödel showed that even in the Principia, even with all of the effort that it made to strictly separate the levels of reasoning, that he could form self-referential statements, and that those self-referential statements were both true and unprovable.

The way that he did it was simply brilliant. The proof was a sequence of steps.

  1. He showed that using Peano arithmetic – that is, the basic definition of natural numbers and natural number arithmetic – that you could take any principia-logic statement, and uniquely encode it as a number – so that every logical statement was a number, and ever number was a specific logical statement.
  2. Then using that basic mechanic, he showed how you could take any property defined by a predicate in the principia’s logic, and encode it as a arithmetic property of the numbers. So a number encoded a statement, and the property of a number could be encoded arithmetically. A number, then, could be both a statement, and a definition of an arithmetic property of a stament, and a logical description of a logical property of a statement – all at once!
  3. Using that encoding, then – which can be formed for any logic that can express Peano arithmetic – he showed that you could form a self-referential statement: a number that was a statement about numbers including the number that was statement itself. And more, it could encode a meta-property of the statement in a way that was both true, and also unprovable: he showed how to create a logical property “There is no proof of this statement”, which applied to its own numeric encoding. So the statement said, about itself, that it couldn’t be proven.

The existence of that statement meant that the Principia – and any similar system! – was incomplete. Completeness means that every true statement is provably true within the system. But the statement encodes the fact that it cannot be proven. If you could prove it, the system would be inconsistent. If you can’t, it’s consistent, but incomplete.

We’re going to go through all of that in detail.

Mark Chu-CarrollGodel Numbering

The first step in Gödel’s incompleteness proof was finding a way of taking logical statements and encoding them numerically. Looking at this today, it seems sort-of obvious. I mean, I’m writing this stuff down in a text file – that text file is a stream of numbers, and it’s trivial to convert that stream of numbers into a single number. But when Gödel was doing it, it wasn’t so obvious. So he created a really clever mechanism for numerical encoding. The advantage of Gödel’s encoding is that it makes it much easier to express properties of the encoded statements arithmetically. (Arithmetically means something very specific here; we’ll see what in a later post.

Before we can look at how Gödel encoded his logic into numbers, we need to look at the logic that he used. Gödel worked with the specific logic variant used by the Principia Mathematica. The Principia logic is minimal and a bit cryptic, but it was built for a specific purpose: to have a minimal syntax, and a complete but minimal set of axioms.

The whole idea of the Principia logic is to be purely syntactic. The logic is expected to have a valid model, but you shouldn’t need to know anything about the model to use the logic. Russell and Whitehead were deliberately building a pure logic where you didn’t need to know what anything meant to use it. I’d really like to use Gödel’s exact syntax – I think it’s an interestingly different way of writing logic – but I’m working from a translation, and the translator updated the syntax. I’m afraid that switching between the older Gödel syntax, and the more modern syntax from the translation would just lead to errors and confusion. So I’m going to stick with the translation’s modernization of the syntax.

The basic building blocks of the logic are variables. Already this is a bit different from what you’re probably used to in a logic. When we think of logic, we usually consider predicates to be a fundamental thing. In this logic, they’re not. A predicate is just a shorthand for a set, and a set is represented by a variable.

Variables are stratified. Again, it helps to remember where Russell and Whitehead were coming from when they were writing the Principia. One of their basic motivations was avoiding self-referential statements like Russell’s paradox. In order to prevent that, they thought that they could create a stratified logic, where on each level, you could only reason about objects from the level below. A first-order predicate would be a second-level object could only reason about first level objects. A second-order predicate would be a third-level object which could reason about second-level objects. No predicate could ever reason about itself or anything on its on level. This leveling property is a fundamental property built into their logic. The way the levels work is:

  • Type one variables, which range over simple atomic values, like specific single natural numbers. Type-1 variables are written as a_1, b_1.
  • Type two variables, which range over sets of atomic values, like sets of natural numbers. A predicate, like IsOdd, about specific natural numbers would be represented as a type-2 variable. Type-2 variables are written a_2, b_2, …
  • Type three variables range over sets of sets of atomic values. The mappings of a function could be represented as type-3 variables: in set theoretic terms, a function is set of ordered pairs. Ordered pairs, in turn, can be represented as sets of sets – for example, the ordered pair (1, 4) would be represented by the set { {1}, {1, 4} }. A function, in turn, would be represented by a type-4 variable – a set of ordered pairs, which is a set of sets of sets of values.

Using variables, we can form simple atomic expressions, which in Gödel’s terminology are called signs. As with variables, the signs are divided into stratified levels:

  • Type-1 signs are variables, and successor expressions. Successor expressions are just Peano numbers written with “succ”: 0, succ(0), succ(succ(0)), succ(a1), etc.
  • Signs of any type greater than 1 are just variables of that type/level.

Once you have signs, you can assemble the basic signs into formulae. Gödel explained how to build formulae in a classic logicians form, which I think is hard to follow, so I’ve converted it into a grammar:

 elementary_formula → signn+1(signn)
 formula → ¬(elementary_formula)
 formula → (elementary_formula) or (elementary_formula)
 formula → ∀ signn (elementary_formula)

That’s the entire logic! It’s tiny, but it’s enough. Everything else from predicate logic can be defined in terms of combinations of these basic formulae. For example, you can define logical “and” in terms of negation and logical “or”: (a ∧ b) ⇔ ¬ (¬ a ∨ ¬ b).

With the syntax of the system set, the next thing we need is the basic axioms of logical inference in the system. In terms of logic the way I think of it, these axioms include both “true” axioms, and the inference rules defining how the logic works. There are five families of axioms.

  • First, there’s the Peano axioms, which define the natural numbers.
    1. \lnot \text{succ}(x_1) = 0): 0 is a natural number, and it’s not the successor of anything.
    2. \text{succ}(x_1) = \text{succ}(y_1) \Rightarrow x_1 = y_1: Successors are unique.
    3. (x_2(0) \land \forall x_1 (x_2(x_1) \Rightarrow x_2(succ(x_1)) \Leftrightarrow \forall x_1 (x_2(x_1)): induction works on the natural numbers.
  • Next, we’ve got a set of basic inference rules about simple propositions. These are defined as axiom schemata, which can be instantiated for any set of formalae p, q, and r.
    1. p \lor p \Rightarrow p
    2. p \Rightarrow p \lor q
    3. p \lor q \Rightarrow q \lor p
    4. (p \Rightarrow q) \rightarrow (p \lor r) \Rightarrow q \lor r
  • Axioms that define inference over quantification. v is a variable, a is any formula, b is any formula where v is not a free variable, and c is a sign of the same level as v, and which doesn’t have any free variables that would be bound if it were inserted as a replacement for v.
    1. \forall v(a) \Rightarrow \text{subst}[v/c](a): if formula a is true for all values of v, then you can substitute any specific value c for v in a, and a must still be true.
    2. (\forall v (b \lor a)) \Rightarrow (b \lor \forall v(a))
  • The Principia’s version of the set theory axiom of comprehension:
    \exists u (\forall v (u(v) \Rightarrow a))
  • And last but not least, an axiom defining set equivalence:
    \forall x_i (x_{i+1}(x_i) \Rightarrow y_{i+1}(y_i)) \Rightarrow x_{i+1} = y_{i+1}

So, now, finally, we can get to the numbering. This is quite clever. We’re going to use the simplest encoding: for every possible string of symbols in the logic, we’re going to define a representation as a number. So in this representation, we are not going to get the property that every natural number is a valid formula: lots of natural numbers won’t be. They’ll be strings of nonsense symbols. (If we wanted to, we could make every number be a valid formula, by using a parse-tree based numbering, but it’s much easier to just let the numbers be strings of symbols, and then define a predicate over the numbers to identify the ones that are valid formulae.)

We start off by assigning numbers to the constant symbols:

Symbols Numeric Representation
0 1
succ 3
¬ 5
7
9
( 11
) 13

Variables will be represented by powers of prime numbers, for prime numbers greater that 13. For a prime number p, p will represent a type one variable, p2 will represent a type two variable, p3 will represent a type-3 variable, etc.

Using those symbol encodings, we can take a formula written as symbols x1x2x3…xn, and encode it numerically as the product 2x13x25x2…pnxn, where pn is the nth prime number.

For example, suppose I wanted to encode the formula: ∀ x1 (y2(x1)) ∨ x2(x1).

First, I’d need to encode each symbol:

  1. “∀” would be 9.
  2. “x1“” = 17
  3. “(” = 11
  4. “y2” = 192 = 361
  5. “(” = 11
  6. “x1” = 17
  7. “)” = 13
  8. “∨” = 7
  9. “x2” = 172 = 289
  10. “(” = 11
  11. “x1” = 17
  12. “)” = 13
  13. “)” = 13

The formula would thus be turned into the sequence: [9, 17, 11, 361, 11, 17, 13, 7, 289, 11, 17, 13, 13]. That sequence would then get turned into a single number 29 317 511 7361 1111 1317 1713 197 23289 2911 3117 3713 4113, which according to Hugs is the number (warning: you need to scroll to see it. a lot!):

1,821,987,637,902,623,701,225,904,240,019,813,969,080,617,900,348,538,321,073,935,587,788,506,071,830,879,280,904,480,021,357,988,798,547,730,537,019,170,876,649,747,729,076,171,560,080,529,593,160,658,600,674,198,729,426,618,685,737,248,773,404,008,081,519,218,775,692,945,684,706,455,129,294,628,924,575,925,909,585,830,321,614,047,772,585,327,805,405,377,809,182,961,310,697,978,238,941,231,290,173,227,390,750,547,696,657,645,077,277,386,815,869,624,389,931,352,799,230,949,892,054,634,638,136,137,995,254,523,486,502,753,268,687,845,320,296,600,343,871,556,546,425,114,643,587,820,633,133,232,999,109,544,763,520,253,057,252,248,982,267,078,202,089,525,667,161,691,850,572,084,153,306,622,226,987,931,223,193,578,450,852,038,578,983,945,920,534,096,055,419,823,281,786,399,855,459,394,948,921,598,228,615,703,317,657,117,593,084,977,371,635,801,831,244,944,526,230,994,115,900,720,026,901,352,169,637,434,441,791,307,175,579,916,097,240,141,893,510,281,613,711,253,660,054,258,685,889,469,896,461,087,297,563,191,813,037,946,176,250,108,137,458,848,099,487,488,503,799,293,003,562,875,320,575,790,915,778,093,569,309,011,025,000,000,000.

Next, we’re going to look at how you can express interesting mathematical properties in terms of numbers. Gödel used a property called primitive recursion as an example, so we’ll walk through a definition of primitive recursion, and show how Gödel expressed primitive recursion numerically.

August 23, 2014

Tommaso DorigoWill Do Peer Review - For Money

Preparing the documents needed for an exam for a career advancement, to a scientist like me, is something like putting order in a messy garage. Leave alone my desk, which is indeed in a horrific messy state - papers stratified and thrown around with absolutely no ordering criterion, mixed with books I forgot I own and important documents I'd rather have reissued rather than searching for them myself. No, I am rather talking about my own scientific production - pubished articles that need to be put in ordered lists, conference talks that I forgot I have given and need to be cited in the curriculum vitae, refereeing work I also long forgot I'd done, internal documents of the collaborations I worked in, students I tutored, courses I gave.

read more

August 22, 2014

Steinn SigurðssonThe Science of Inequality

Science had a very interesting special section this spring: The Science of Inequality – basically doing a summary and review of issues related to the stuff in Piketty’s book Capital in the Twenty-First Century

The section has a series of very interesting articles on a range of related topics:

and a bunch of others.
First and last are heavy duty reviews, others are “News and Views” articles, there are a number of others in the section, all worth reading.

The whole section makes for interesting, and somewhat depressing reading.
The basic argument about the role of capital vs earned income and long term structure is fairly robust, the main failure is, I think, the inability of models to deal with external and internal shocks, destructive terms in macroeconomics, and gross stupidity by economic actors leading to negative sum games – which probably suffices to explain the reason why some Roman family does not own the whole Solar System yet through simple application of compound interest.

The analog to some interesting and relatively simple physical systems in the works of Yakovenko and others is quite interesting, and explains the essential features of the long tail of the income distribution at the high end, its robustness and the role of unearned income.

Definitely recommended reading.

BackreactionHello from Iceland

So here I am on an island in the middle of the Atlantic ocean that's working on its next volcano eruption.


In case you missed yesterday's Google Hangout, FQXi just announced the winner's of this year's essay contest and - awesomeliness alert! - my essay "How to save the world in five simple steps" made it first prize!

I'm happy of course about the money, but what touches me much more is that this is vivid documentation I'm not the only one who thinks the topics I addressed in my essay are relevant. If you've been following this blog for some while then you know of course that I've been thinking back and forth about the problem of emerging social dynamics, in the scientific communities as well as in society by large, and our inability to foresee and react to the consequences of our actions.

Ten years ago I started out thinking the problem is the modeling of these systems, but over the years, as more and more research and data on these trends became available, I've become convinced the problem isn't understanding the system dynamics to begin with, but that nobody is paying attention to what we've learned.

I see this every time I sit in a committee meeting and try to tell them something about research dedicated to intelligent decision making in groups, cognitive biases, or the sociology of science. They'll not listen. They might be polite and let me finish, but it's not information they will take into account in their decision making. And the reason is basically that it takes them too much time and too much effort. They'll just continue the way it's always been done; they'll continue making the same mistakes over again. There's no feedback in this system, and no learning by trial and error.

The briefest of brief summaries of my essay is that we'll only be able to meet the challenges mankind is facing if our social systems are organized so that we can react to complex and emerging problems caused by our own interaction and that with our environment. That will only be possible if we have the relevant information and use it. And we'll only use this information if it's cheap, in the sense of it being simple, fast, and intuitive to use.

Most attempts to solve the problems that we are facing are based on an unrealistic and utopian image of the average human, the well-educated, intellectual and concerned citizen who will process all available information and come to smart decisions. That is never going to happen, and that's the issue I'm taking on in my essay.

I'll be happy to answer questions about my essay. I would prefer to do this here rather than at the FQXi forum. Note though that I'll be stuck in transit for the next day. If that volcano lets me off this island that is.

Tim GowersICM2014 — Ian Agol plenary lecture

On the second day of the congress I hauled myself out of bed in time, I hoped, to have a shower and find some breakfast before the first plenary lecture of the congress started at 9am. The previous day in the evening I had chanced upon a large underground shopping mall directly underneath the conference centre, so I thought I’d see if I could find some kind of café there. However, at 8:30 in the morning it was more or less deserted, and I found myself wandering down very long empty passages, constantly looking at my watch and worrying that I wouldn’t have time to retrace my steps, find somewhere I could have breakfast, have breakfast, and walk the surprisingly long distance it would be to the main hall, all by 9am.

Eventually I just made it, by going back to a place that was semi-above ground (meaning that it was below ground but you entered it a sunken area that was not covered by a roof) that I had earlier rejected on the grounds that it didn’t have a satisfactory food option, and just had an espresso. Thus fortified, I made my way to the talk and arrived just in time, which didn’t stop me getting a seat near the front. That was to be the case at all talks — if I marched to the front, I could get a seat. I think part of the reason was that there were “Reserved” stickers on several seats, which had been there for the opening ceremony and not been removed. But maybe it was also because some people like to sit some way back so that they can zone out of the talk if they want to, maybe even getting out their laptops. (However, although wireless was in theory available throughout the conference centre, in practice it was very hard to connect.)

The first talk was by Ian Agol. I was told before the talk that I would be unlikely to understand it — the comment was about Agol rather than about me — and the result of this lowering of my expectations was that I enjoyed the talk. In fact, I might even have enjoyed it without the lowering of expectations. Having said that, I did hear one criticism afterwards that I will try to explain, since it provides a good introduction to the content of the lecture.

When I first heard of Thurston’s famous geometrization conjecture, I thought of it as the ultimate aim of the study of 3-manifolds: what more could you want than a complete classification? However, this view was not correct. Although a proof of the geometrization conjecture would be (and later was) a massive step forward, it wouldn’t by itself answer all the questions that people really wanted to answer about 3-manifolds. But some very important work by Agol and others since Perelman’s breakthrough has, in some sense that I don’t understand, finished off some big programme in the subject. The criticism I heard was that Agol didn’t really explain what this programme was. I hadn’t really noticed that as a problem during the talk — I just took it on trust that the work Agol was describing was considered very important by the experts (and I was well aware of Agol’s reputation) — but perhaps he could have done a little more scene setting.

What he actually did by way of introduction was to mention two questions from a famous 1982 paper of Thurston (Three-dimensional manifolds, Kleinian groups and hyperbolic geometry) in which he asked 24 questions. The ones Agol mentioned were questions 16-18. I’ve just had a look at the Thurston paper, and it’s well worth a browse, as it’s a relatively gentle survey written for the Bulletin of the AMS. It also has lots of nice pictures. I didn’t get a sense from my skim through it that questions 16-18 were significantly more important than the others (apart from the geometrization conjecture), but perhaps the story is that when the dust had settled after Perelman’s work, it was those questions that were still hard. Maybe someone who knows what they’re talking about can give a better explanation in a comment.

One definition I learned from the lecture is this: a 3-manifold is said to have a property P virtually if it has a finite-sheeted cover with property P. I presume that a finite-sheeted cover is another 3-manifold and a suitable surjection to the first one such that each point in the first has k preimages for some finite k (that doesn’t depend on the point).

Thurston’s question 16 asks whether every aspherical 3-manifold (I presume that just means that it isn’t a 3-sphere) is virtually Haken.

A little later in the talk, Agol told us what “Haken” meant, other than being the name of a very well-known mathematician. Here’s the definition he gave, which left me with very little intuitive understanding of the concept. A compact 3-manifold with hyperbolic interior is Haken if it contains an embedded \pi_1-injective surface. An example, if my understanding of my rapidly scrawled notes is correct, is a knot complement, one of the standard ways of constructing interesting 3-manifolds. If you take the complement of a knot in \mathbb{R}^3 you get a 3-manifold, and if you take a tubular neighbourhood of that knot, then its boundary will be your \pi_1-injective surface. (I’m only pretending to know what \pi_1-injective means here.)

Thurston, in the paper mentioned earlier, describes Haken manifolds in a different, and for me more helpful, way. Let me approach the concept in top-down fashion: that is, I’ll define it in terms of other mysterious concepts, then work backwards through Thurston’s paper until everything is defined (to my satisfaction at least).

Thurston writes, “A 3-manifold M^3 is called a Haken manifold if it is prime and it contains a 2-sided incompressible surface (whose boundary, if any, is on \partial M) which is not a 2-sphere.”

Incidentally, one thing I picked up during Agol’s talk is that it seems to be conventional to refer to a 3-manifold as M^3 the first time you mention it and as M thereafter.

Now we need to know what “prime” and “incompressible” mean. The following paragraph of Thurston defines “prime” very nicely.

The decomposition referred to really has two stages. The first stage is the prime decomposition, obtained by repeatedly cutting a 3-manifold M^3 along 2-spheres embedded in M^3 so that they separate the manifold into two parts neither of which is a 3-ball, and then gluing 3-balls to the resulting boundary components, thus obtaining closed 3-manifolds which are “simpler”. Kneser proved that this process terminates after a finite number of steps. The resulting pieces, called the prime summands of M^3 , are uniquely determined by M^3 up to homeomorphism.

Hmm, perhaps the rule is more general: you refer to it as M^3 to start with and after that it’s sort of up to you whether you want to call it M^3 or M.

The equivalent process in two dimensions could be used to simplify a two-holed torus. You first identify a circle that cuts it into two pieces and doesn’t bound a disc: basically what you get if you chop the surface into two with one hole on each side. Then you have two surfaces with circles as boundaries. You fill in those circles with discs and then you have two tori. At this point you can’t chop the surface in two in a non-trivial way, so a torus is prime. Unless my intuition is all wrong, that’s more or less telling us that the prime decomposition of an arbitrary orientable surface (without boundary) is into tori, one for each hole, except that the sphere would be prime.

What about “incompressible”? Thurston offers us this.

A surface N^2 embedded in a 3-manifold M^3 is two-sided if N^2 cuts a regular neighborhood of N^2 into two pieces, i.e., the normal bundle to N^2 is oriented. Since we are assuming that M^3 is oriented, this is equivalent to the condition that N^2 is oriented. A two-sided surface is incompressible if every simple curve on N^2 which bounds a disk in M^3 with interior disjoint from N^2 also bounds a disk on N^2.

I think we can forget the first part there: just assume that everything in sight is oriented. Let’s try to think what it would mean for an embedded surface not to be incompressible. Consider for example a copy of the torus embedded in the 3-sphere. Then a loop that goes round the torus bounds a disc in the 3-sphere with no problem, but it doesn’t bound a disc in the torus. So that torus fails to be incompressible. But suppose we embedded the torus into a 3-dimensional torus in a natural way, by taking the 3D torus to be the quotient of \mathbb{R}^3 by \mathbb{Z}^3 and the 2D torus to be the set of all points with x-coordinate an (equivalence class of an) integer. Then the loops that don’t bound discs in the 2-torus don’t bound discs in the 3-torus either, so that surface is — again if what seems likely to be true actually is true — incompressible. It seems that an incompressible surface sort of spans the 3-manifold in an essential way rather than sitting inside a boring part of the 3-manifold and pretending that it isn’t boring.

OK, that’s what Haken manifolds are, but for the non-expert that’s not enough. We want to know why we should care about them. Thurston gives us an answer to this too. Here is a very useful paragraph about them.

It is hard to say how general the class of Haken manifolds is. There are many closed manifolds which are Haken and many which are not. Haken manifolds can be analyzed by inductive processes, because as Haken proved, a Haken manifold can be cut successively along incompressible surfaces until one is left with a collection of 3-balls. The condition that a 3-manifold has an incompressible surface is useful in proving that it has a hyperbolic structure (when it does), but intuitively it really seems to have little to do with the question of existence of a hyperbolic structure.

To put it more vaguely, Haken manifolds are good because they can be chopped into pieces in a way that makes them easy to understand. So I’d guess that the importance of showing that every aspherical 3-manifold is virtually Haken is that finite-sheeted coverings are sufficiently nice that even knowing that a manifold is virtually Haken means that in some sense you understand it.

One very nice thing Agol did was give us some basic examples of 3-manifolds, by which I mean not things like the 3-sphere, but examples of the kind that one wouldn’t immediately think of and that improve one’s intuition about what a typical 3-manifold looks like.

The first one was a (solid) dodecahedron with opposite faces identified — with a twist. I meant the word “twist” literally, but I suppose you could say that the twist is that there is a twist, meaning that given two opposite faces, you don’t identify each vertex with the one opposite it, but rather you first rotate one of the faces through 2\pi/5 and then identify opposite vertices. (Obviously you’ll have to do that in a consistent way somehow.)

There are some questions here that I can’t answer in my head. For example, if you take a vertex of the dodecahedron, then it belongs to three faces. Each of these faces is identified in a twisty way with the opposite face, so if we want to understand what’s going on near the vertex, then we should glue three more dodecahedra to our original one at those faces, keeping track of the various identifications. Now do the identifications mean that those dodecahedra all join up nicely so that the point is at the intersection of four copies of the dodecahedron? Or do we have to do some more gluing before everything starts to join together? One thing we don’t have to worry about is that there isn’t room for all those dodecahedra, which in a certain sense would be the case if the solid angle at a vertex is greater than 1. (I’m defining, I hope standardly, the solid angle of a cone to be the size of the intersection of that cone with a unit sphere centred at the apex, or whatever one calls it. Since a unit sphere has surface area 4\pi, the largest possible solid angle is 4\pi.)

Anyhow, as I said, this doesn’t matter. Indeed, far from mattering, it is to be positively welcomed, since if the solid angles of the dodecahedra that meet at a point add up to more than 4\pi, then it indicates that the geometry of the resulting manifold will be hyperbolic, which is exactly what we want. I presume that another way of defining the example is to start with a tiling of hyperbolic 3-space by regular dodecahedra and then identify neighbouring dodecahedra using little twists. I’m guessing here, but opposite faces of a dodecahedron are parallel, while not being translates of one another. So maybe as you come out of a face, you give it the smallest (anticlockwise, say) twist you can to make it a translate of the opposite face, which will be a rotation by an angle of \pi/5, and then re-enter the opposite face by the corresponding translated point. But it’s not clear to me that that is a consistent definition. (I haven’t said which dodecahedral tiling I’m even taking. Perhaps the one where all the pentagons have right angles at their vertices.)

The other example was actually a pair of examples. One was a figure-of-eight-knot complement, and the other was the complement of the Whitehead link. Agol showed us drawings of the knot and link: I’ll leave you to Google for them if you are interested.

How does a knot complement give you a 3-manifold? I’m not entirely sure. One thing that’s clear is that it gives you a 3-manifold with boundary, since you can take a tubular neighbourhood of the knot/link and take the complement of that, which will be a 3D region whose boundary is homeomorphic to a torus but sits in \mathbb{R}^3 in a knotted way. I also know (from Thurston, but I’ve seen it before) that you can produce lots of 3-manifolds by defining some non-trivial homeomorphism from a torus to itself, removing a tubular neighbourhood of a knot from \mathbb{R}^n and gluing it back in again, but only after applying the homeomorphism to the boundary. That is, given your solid knot and your solid-knot-shaped hole, you identify the boundary of the knot with the boundary of the hole, but not in the obvious way. This process is called Dehn surgery, and in fact can be used to create all 3-manifolds.

But I still find myself unable to explain how a knot complement is itself a 3-manifold, unless it is a 3-manifold with boundary, or one compactifies it somehow, or something. So I had the illusion of understanding during the talk but am found out now.

The twisted-dodecahedron example was discovered by Seifert and Weber, and is interesting because it is a non-Haken manifold (a discovery of Burton, Rubinstein and Tillmann) that is virtually Haken.

Going back to the question of why the geometrization conjecture didn’t just finish off the subject, my guess is that it is probably possible to construct lots of complicated 3-manifolds that obviously satisfy the geometrization conjecture because they are already hyperbolic, but that are not by virtue of that fact alone easy to understand. What Agol appeared to say is that the role of the geometrization conjecture is essentially to reduce the whole problem of understanding 3-manifolds to that of understanding hyperbolic 3-manifolds. He also said something that is more or less a compulsory remark in a general lecture on 3-manifolds, namely that although they are topological objects, they are studied by geometrical means. (The corresponding compulsory remark for 4-manifolds is that 4D is the odd dimension out, where lots of weird things happen.)

As I’ve said, Agol discussed two other problems. I think the virtual Haken conjecture was the big one (after all, that was the title of his lecture), but the other two were, as he put it, stronger statements that were easier to think about. Question 17 asks whether every aspherical 3-manifold virtually has positive first Betti number, and question 18 asks whether it virtually fibres over the circle. I’ll pass straight to the second of these questions.

A 3-manifold M^3 fibres over the circle if there is a (suitably nice) map \eta:M^3\to S^1 such that the preimage of every point in S^1 is a surface S (the fibre at that point).

Let me state Agol’s main results without saying what they mean. In 2008 he proved that if M^3 is virtually special cubulated, then it is virtually fibred. In 2012 he proved that cubulations with hyperbolic fundamental group are virtually special, answering a 2011 conjecture of Wise. A corollary is that every closed hyperbolic 3-manifold virtually fibres over the circle, which answers questions 16-18.

There appears to be a missing step there, namely to show that every closed hyperbolic 3-manifold has a cubulation with hyperbolic fundamental group. That I think must have been the main message of what he said in a fairly long discussion about cubulations that preceded the statements of these big results, and about which I did not take detailed notes.

What I remember about the discussion was a number of pictures of cube complexes made up of cubes of different dimensions. An important aspect of these complexes was a kind of avoidance of positive curvature, which worked something like this. (I’ll discuss a low-dimensional situation, but it generalizes.) Suppose you have three squares that meet at a vertex just as they do if they are faces of a cube. Then at that vertex you’ve got some positive curvature, which is what you want to avoid. So to avoid it, you’re obliged to fill in the entire cube, and now the positive curvature is rendered harmless because it’s just the surface of some bit of 3D stuff. (This feels a bit like the way we don’t pay attention to embedded surfaces unless they are incompressible.)

I haven’t given the definition because I don’t remember it. The term CAT(0) came up a lot. At the time I felt I was following what was going on reasonably well, helped by the fact that I had seen an excellent talk by my former colleague Vlad Markovic on similar topics. (Markovic was mentioned in Agol’s talk, and himself was an invited speaker at the ICM.) The main message I remember now is that there is some kind of dictionary between cube complexes and 3-manifolds, so you try to find “cubulations” with particular properties that will enable you to prove that your 3-manifolds have corresponding properties. Note that although the manifolds are three-dimensional, the cubes in the corresponding cube complexes are not limited to three dimensions.

That’s about all I can remember, even with the help of notes. In case I have given the wrong impression, let me make clear that I very much enjoyed this lecture and thought it got the “working” part of the congress off to a great start. And it’s clear that the results of Agol and others are a big achievement. If you want to watch the lecture for yourself, it can be found here.


Tim GowersICM2014 — Hairer laudatio

I haven’t kept up anything like the frequency of posts at this ICM that I managed at the last one. There are at least three reasons for this. One is that I was in the middle of writing up a result, so I devoted some of my rare free moments to that. Another is that the schedule was so full of good talks that I hardly skipped any sessions. And the third is that on the last day I was taken ill: I won’t go into too much detail, but let’s say that what I had sort of rhymed with “Korea”, but also left me feeling fairly terrible. So I didn’t much enjoy the conference banquet — at least from the food point of view — and then the next day, which I can’t quite believe was actually yesterday, when I got up at 5am in order to catch the bus from the hotel to the airport in time for my 9:30 flight back to Paris, I felt sufficiently terrible that I wasn’t quite sure how I would get through the 11-hour flight, four-hour stopover in Paris and four-and-a-half-hour train journey from Paris to Béziers.

I was rescued by an extraordinary piece of luck. When I got to the gate with my boarding card, the woman who took it from me tore it up and gave me another one, curtly informing me that I had been upgraded. I have no idea why. I wonder whether it had anything to do with the fact that in order to avoid standing any longer than necessary I waited until almost the end before boarding. But perhaps the decision had been made well before that: I have no idea how these things work. Anyhow, it meant that I could make my seat pretty well horizontal and I slept for quite a lot of the journey. Unfortunately, I wasn’t feeling well enough to make full use of all the perks, one of which was a bar where one could ask for single malt whisky. I didn’t have any alcohol or coffee and only picked at my food. I also didn’t watch a single film or do any work. If I’d been feeling OK, the day would have been very different. However, perhaps the fact that I wasn’t feeling OK meant that the difference it made to me to be in business class was actually greater than it would have been otherwise. I rather like that way of looking at it.

An amusing thing happened when we landed in Paris. We landed out on the tarmac and were met by buses. They let the classy people off first (even we business-class people had to wait for the first-class people, just in case we got above ourselves), so that they wouldn’t have to share a bus with the riff raff. One reason I had been pleased to be travelling business class was that it meant that I had after all got to experience the top floor of an Airbus 380. But when I turned round to look, there was only one row of windows, and then I saw that it had been a Boeing 777. Oh well. It was operated by Air France. I’ve forgotten the right phrase: something like “shared code”. A number of little anomalies resolved themselves, such as that that take-off didn’t feel like the one in Paris, that the slope of the walls didn’t seem quite correct if we were on the top floor, etc.

I thought that as an experiment I would see what I could remember about the laudatio for Martin Hairer without the notes I took, and then after that I would see how much more there was to say with the notes. So here goes. The laudatio was given by Ofer Zeitouni, one of the people on the Fields Medal committee. Early on, he made a link with what Ghys had said about Avila, by saying that Hairer too studied situations where physicists don’t know what the equation is. However, these situations were somewhat different: instead of studying typical dynamical systems, Hairer studied stochastic PDEs. As I understand it, an important class of stochastic PDEs is conventional PDEs with a noise term added, which is often some kind of Brownian motion term.

Unfortunately, Brownian motion can’t be differentiated, but that isn’t by itself a huge problem because it can be differentiated if you allow yourself to work with distributions. However, while distributions are great for many purposes, there are certain things you can’t do with them — notably multiply them together.

Hairer looked at a stochastic PDE that modelled a physical situation that gives rise to a complicated fractal boundary between two regions. I think the phrase “interface dynamics” may have been one of the buzz phrases here. The naive approach to this stochastic PDE led quickly to the need to multiply two distributions together, so it didn’t work. So Hairer added a “mollifier” — that is, he smoothed the noise slightly. Associated with this mollifier was a parameter \epsilon: the smaller \epsilon was, the less smoothing took place. So he then solved the smoothed system, let \epsilon tend to zero, showed that the smoothed solutions tended to a limit, and defined that limit to be the solution of the original equation.

The way I’ve described it, that sounds like a fairly obvious thing to do, so what was so good about it?

A first answer is that in this particular case it was far from obvious that the smoothed solutions really did tend to a limit. In order to show this, it was necessary to do a renormalization (another thematic link with Avila), which involved subtracting a constant C_\epsilon. The only other thing I remember was that the proof also involved something a bit like a Taylor expansion, but that a key insight of Hairer was that instead of expanding with respect to a fix basis of functions, one should instead let the basis of functions depend on the function was expanding — or something like that anyway.

I was left with the feeling that a lot of people are very excited about what Hairer has done, because with his new theoretical framework he has managed to go a long way beyond what people thought was possible.

OK, now let me look at the notes and see whether I want to add anything.

My memory seems to have served me quite well. Here are a couple of extra details. An important one is that Zeitouni opened with a brief summary of Hairer’s major contributions, which makes them sound like much more than a clever trick to deal with one particular troublesome stochastic PDE. These were

1. a theory of regularity structures, and

2. a theory of ergodicity for infinite-dimensional systems.

I don’t know how those two relate to the solution of the differential equation, which, by the way, is called the KPZ equation, and is the following.

\partial_th = \Delta h + (\partial_xh)^2 +\xi.

It models the evolution of interfaces. (So maybe “interface dynamics” was not after all the buzz phrase.)

When I said that the noise was Brownian, I should have said that the noise was completely uncorrelated in time, and therefore makes no sense pointwise, but it integrates to Brownian motion.

The mollifiers are functions \xi_\epsilon that replace the noise term \xi. The constants C_\epsilon I mentioned earlier depend on your choice of mollifier, but the limit doesn’t (which is obviously very important).

What Zeitouni actually said about Taylor expansion was that one should measure smoothness by expansions that are tailored (his word not mine) to the equation, rather than with respect to a universal basis. This was a key insight of Hairer.

One of the major tools introduced by Hairer is a generalization of something called rough-path theory, due to Terry Lyons. Another is his renormalization procedure.

Zeitouni summarized by saying that Hairer had invented new methods for defining solutions to PDEs driven by rough noise, and that these methods were robust with respect to mollification. He also said something about quantitative behaviour of solutions.

If you find that account a little vague and unsatisfactory, bear in mind that my aim here is not to give the clearest possible presentation of Hairer’s work, but rather to discuss what it was like to be at the ICM, and in particular to attend this laudatio. One doesn’t usually expect to come out of a maths talk understanding it so well that one could give the same talk oneself. As I’ve mentioned in another post, there are some very good accounts of the work of all the prizewinners here. (To see them, follow the link and then follow further links to press releases.)

Update: if you want to appreciate some of these ideas more fully, then here is a very nice blog post: it doesn’t say much more about Hairer’s work, but it does a much better job than this post of setting his work in context.


Chad OrzelThrowback Thursday

OK, the photo above is a recent picture of me– yesterday, in fact. But the spiral-carved rock I’m standing next to was carved that way a bit more than five thousand years ago, so that ought to count as a throwback…

We’ve been in Dublin the last few days, and on Thursday we took a bus tour out to Newgrange. this is one of the things I wanted to make sure to see while we were here, as I make reference to it in the forthcoming book, and at least two classes that I teach. And it is, indeed, spectacular; the reconstructed white wall might be historically dubious, but the interior passage and vault is still perfectly intact (some old graffiti scratched into the stones aside), and still keeps the central chamber dry after millennia. In Ireland.

It’s a tight squeeze to get into the central chamber, but worth the effort– our distant ancestors were pretty damn good at science and engineering. Which is, of course, why it appears in the book…

Anyway, we’ve had a great time in Dublin and environs. We’re headed back to London today, and Kate’s headed back to the US tomorrow, while I kick around London for the weekend, then head to Stockholm next week.

Jordan EllenbergThe 1979 Houston Astros hit only 49 home runs

49 home runs! That’s nuts. They hit more triples than home runs. Their home run leader was Jose Cruz, who hit 9. In September they went 20 straight games without hitting a home run, the longest such streak in modern baseball. And that was after they went 15 games without hitting a rome run in July!

Must have been a pretty bad team, right? But no! They won 89 games and finished second, just a game and a half behind the Reds. That 15 game homerless streak in July? They went 11-4 in those games.


August 21, 2014

Mark Chu-CarrollTypes and Lambda Calculus

Most programmers are familiar with λ-calculus. It’s one of the most widely used tools in theoretical computer science and logic. It’s the basis of many programming languages, including Lisp, Haskell, ML, and Scala. It’s had a strong influence on many other programming languages that aren’t strictly based on it, including Java, C++, and Javascript.

Motivation: the Problem

Modern programmers love the λ-calculus. A lot of the mathematicians who worked in what became computer science also loved it. But if you go back to its roots, there was a problem.

The λ-calculus in its non-typed form, as it was originally defined, was inconsistent.

Haskell Curry eventually reduced the paradox to something quite simple:

 r = (\lambda x. ((x x) \Rightarrow y))

That is a function which returns y if (x x) is true.

To see the paradox, we need to change how we think of λ-calculus. Most people today who know λ-calculus are programmers, and we think of it primarily as something like a programming language. But the roots of λ-calculus were laid before computers as we know them existed. At the time, λ-calculus was a tool used by logicians. To them it wasn’t a computing machine, it was a logical model of computation.

When we look at the expression (r r) \Rightarrow y, what we see is executable logic that reads as “If applying r to r returns true, then return y”. And by using computational reasoning, we’d conclude that (r r) is a non-terminating computation, because to get the value of (r r), we need to evaluate (r r), and then apply r to that result. But to logicians like Haskell Curry, it read is “The statement (r r) implies y”. Rendered into simple english, it’s a statement like: “If this statement is true, then my hair is purple”. It’s purely a logical implication, and so even though we can’t actually evaluate (r r), in logical terms, we can still say “This is a well-formed logical statement, so is it true?”.

Is that true? Suppose it is. If it’s true, then that means that it says that whatever y is must be true. Without knowing what y is, we can’t be sure if this is a true statement.

Suppose that it’s false. If (r r) is false, that means that the implication (r r) must be true, because FOPL says that if the antecedent of an implication is false, the entire implication is true.

It’s clearer when you look at the english: If this sentence is true, then my hair is purple”. My hair isn’t purple, which means that the statement can’t be true. But if the statement isn’t true, then the implication is true, so the statement is true. We’re caught in a self-reference loop, just like what we saw in Russell’s paradox.

This was considered a very bad thing. It’s a bit subtler than the problem with naive set theory. In set theory, we had an unambiguous inconsistency: we had a set that was well-defined under the axioms of set theory, and without any question, that set was inconsistent. Here, we’ve got an expression which might be consistent, and then again, it might not. It depends on what the consequent – the “my hair is purple” part – of the implication is. If it’s something true, then we’re fine. If it’s not, then we’re stuck.

The problem is, no matter what you put into that “my hair is purple” part, you can use this to produce a proof that it’s true, even if it isn’t. And that’s a fundamental inconsistency, which can’t be tolerated.

Curry’s paradox meant that the logicians working with λ-calculus had a problem that they needed to solve. And following the fashion of the time, they solved it using types. Much like ST type theory attempted to preserve as much of set theory as possible while fixing the inconsistency, typed λ-calculus tried to fix the inconsistency of λ-calculus. The basic approach was also similar: the typed λ-calculus introduced a stratification which made it impossible to build the structures that led to inconsistency.

In ST, the solution was to build a logical system in which it was impossible to express self-reference. In λ-calculus, the basic goal was the same: to eliminate self-reference. But in λ-calculus, that restriction is stated differently. What the type system in λ-calculus does is make it impossible to construct a statement that is a fixed-point combinator.

If we look carefully at the paradoxical statement up above, it’s not really pure λ-calculus. It relies on the fact that we’re defining a function named r, and then applying r to itself using the name r. But that’s just syntactic sugar: in fact, r can’t reference r. You need some tool to let you apply an expression to itself: that tool is called a fixed-point combinator. The most common fixed-point in λ-calculus is the Y combinator, which underlies how recursive computations usually work in λ-calculus.

The way that the simply typed λ-calculuss gets around the Curry paradox is by making it impossible to build a well-typed fixed-point combinator. Without that, you can’t build the self-referential constructs that cause the inconsistency. The downside is that the simply typed λ-calculus, without a fixed point combinator, is not Turing complete. The evaluation of every simple typed λ-calculus expression will eventually terminate.

(As an aside: this isn’t really a problem in practice. The self-referential expressions that cause the Curry paradox turn into non-terminating computations. So they don’t produce a paradox; they just don’t produce anything. Logical inconsistencies don’t produce results: they’re still an error, instead of terminating with an inconsistent result, they just never terminate. Again, to the logicians at the time, the idea of non-termination was, itself, a deep problem that needed a solution.)

The Solution: Stratification by Types

The way that the simply typed λ-calculus fixed things was by creating a stratification using types. The type system created a restriction on the set of statements that were valid, well-formed logical statements. That restriction made it impossible to express a fixed point combinator or a general recursive computation of any kind.

It’s helpful to once again refer back to ST. In ST, we started the type of atoms at level 0. To get to level 1, we defined predicates over level-0 objects, and the set of objects that matched the predicate was a level-1 type. Then we could define predicates over level-1 objects, and the set of level-1 types that satisfied the predicate was a level-2 type, and so on. In the simply typed λ-calculus, we do the same thing, but with functions: we can build functions that operate on primitive atoms (also called base values), or on other functions. When we define a function, it must be assigned a type, and that type must be something at a lower level than the function being defined. You can’t ever pass a function to itself, because the function is an object at a higher level than the type of its parameter.

We start with base types. Every simply-typed lambda calculus starts with a collection of primitive atomic values. The set of atomic values is partitioned into a collection of types, which are called the base types. Base types are usually named by single lower-case greek letters: So, for example, we could have a type \sigma which consists of the set of natural numbers; a type \tau which corresponds to boolean true/false values; and a type \gamma which corresponds to strings.

Once we have basic types, then we can talk about the type of a function. A function maps from a value of one type (the type of parameter) to a value of a second type (the type of the return value). For a function that takes a parameter of type \gamma, and returns a value of type \delta, we write its type as “\gamma \rightarrow \delta“. The “\rightarrow” is called the function type constructor; it associates to the right, so \gamma \rightarrow \delta \rightarrow \epsilon is equivalent to \gamma \rightarrow (\delta \rightarrow \epsilon).

In every function declaration, we need to specify the type of the parameters and the type of the result. So:

  1. \sigma \rightarrow \sigma is a function which takes natural number as a parameter, and returns a natural number as a result.
  2. (\sigma \rightarrow \sigma) \rightarrow \sigma is a function which takes a \sigma \rightarrow \sigma function as a parameter, and produces a natural number as a result.

As usual in λ-calculus, we don’t have multi-parameter functions – all functions are curried, so a function like addNatural would be a function that takes a natural number as a paramater, and returns a function that takes a natural number and returns a natural number. So the type of addNatural is \sigma \rightarrow \sigma \rightarrow \sigma.

How does this get around the self-reference problem? A function like the one in the Curry paradox takes an object of its own type as a parameter. There’s no way to write that in a type system. It’s a significant restriction which makes it impossible to write general recursive expressions – it limits us to something close to primitive recursion, but it avoids the inconsistency. All valid expressions written with this system of types in place is guaranteed to terminate with a consistent result.

Extending λ-calculus with types

Now, it’s time to get at least a little bit formal, to see how we integrate a stratified type system into the lambda calculus. There’s two facets to that: the syntactic, and the analytic. The syntactic part shows how we extend λ-calculus to include type declarations, and the analytic part shows how to determine whether or not an expression with type declarations is valid.

The syntax part is easy. We add a “:” to the notation; the colon has an expression or variable binding on its left, and a type specification on its right. It asserts that whatever is on the left side of the colon has the type specified on the right side. A few examples:

\lambda x: \nu . x + 3
This asserts that the parameter, x has type nu, which we’ll use as the type name for the natural numbers. (In case it’s hard to tell, that’s a greek letter “nu” for natural.) There is no assertion of the type of the result of the function; but since we know that “+” is a function with type \nu  \rightarrow \nu \rightarrow \nu, we can infer that the result type of this function will be \nu.
(\lambda x . x + 3): \nu \rightarrow \nu
This is the same as the previous, but with the type declaration moved out, so that it asserts the type for the lambda expression as a whole. This time we can infer that x : \nu because the function type is \nu \rightarrow \nu, which means that the function parameter has type \nu.
\lambda x: \nu, y:\delta . \text{if}\hspace{1ex} y\hspace{1ex} \text{then}\hspace{1ex} x * x \hspace{1ex}\text{else} \hspace{1ex} x
This is a two parameter function; the first parameter has type ν, and the second has type δ. We can infer the return type, which is ν. So the type of the full function is ν → δ → ν. This may seem surprising at first; but remember that λ-calculus really works in terms of single parameter functions; multi-parameter functions are a shorthand for currying. So really, this function is: λ x : ν . (λ y : δ . if y then x * x else x); the inner lambda is type δ → ν; the outer lambda is type ν → (δ → ν).

To talk about whether a program is valid with respect to types (aka well-typed), we need to introduce a set of rules for checking the validity of the type declarations. Using those rules, we can verify that the program is type-consistent.

In type analysis, we’ll talk about judgements. When we can infer the type of an expression using an inference rule, we call that inference a type judgement. Type analysis allows us to use inference and judgements to reason about types in a lambda expression. If any part of an expression winds up with an inconsistent type judgement, then the expression is invalid. If we can show that all of the components of an expression have consistent type judgements, then we can conclude that the expression is well-typed, meaning that it’s valid with respect to the type system.

Type judgements are usually written in a sequent-based notation, which looks like a fraction where the numerator consists of statements that we know to be true; and the denominator is what we can infer from the numerator. In the numerator, we normally have statements using a context, which is a set of type judgements that we already know;it’s usually written as an uppercase greek letter. If a type context includes the judgement that x : \alpha, I’ll write that as \Gamma :- x : \alpha.

Rule 1: Type Identity

\frac{\mbox{}}{x : \alpha :- x: \alpha}

This is the simplest rule: if we have no other information except a declaration of the type of a variable, then we know that that variable has the type it was declared with.

Rule 2: Type Invariance

\frac{ \Gamma :- x:\alpha, x != y}{\Gamma + y:\beta :- x:\alpha}

This rule is a statement of non-interference. If we know that x:\alpha, then inferring a type judgement about any other term cannot change our type judgement for x.

Rule 3: Function Type Inference

\frac{\Gamma + x:\alpha :- y:\beta}{\Gamma :- (\lambda x:\alpha . y):\alpha \rightarrow \beta}

This statement allows us to infer function types given parameter types. Ff we know the type of the parameter to a function is \alpha; and if, with our knowledge of the parameter type, we know that the type of term that makes up the body of the function is \beta, then we know that the type of the function is \alpha \rightarrow \beta.

Rule 4: Function Application Inference

\frac{\Gamma :- x: \alpha \rightarrow \beta, \Gamma :- y:\alpha}{\Gamma :- (x y): \beta}

This one is easy: if we know that we have a function that takes a parameter of type \alpha and returns a value of type \beta, then if we apply that function to a value of type \alpha, we’ll get a value of type \beta.

These four rules are it. If we can take a lambda expression, and come up with a consistent set of type judgements for every term in the expression, then the expression is well-typed. If not, then the expression is invalid.

So let’s try taking a look at a simple λ-calculus expression, and seeing how inference works on it.

\lambda x y . y x

Without any type declarations or parameters, we don’t know the exact type. But we do know that “x” has some type; we’ll call that “α”; and we know that “y” is a function that will be applied with “x” as a parameter, so it’s got parameter type α, but its result type is unknown. So using type variables, we can say “x:α,y:α→β”. We can figure out what “α” and “β” are by looking at a complete expression. So, let’s work out the typing of it with x=”3″, and y=”λ a:ν.a*a”. We’ll assume that our type context already includes “*” as a function of type “ν→ν→ν”, where ν is the type of natural numbers.

  • “λ x y . y x) 3 (λ a:ν . a * a)”: Since 3 is a literal integer, we know its type: 3:ν.
  • By rule 4, we can infer that the type of the expression “a*a” where “a:ν” is “ν”, and *:ν→ν→ν so therefore, by rule 3 the lambda expression has type “ν→ν”. So with type labelling, our expression is now: “(λ x y . y x) (3:ν) (λ a:ν.(a*a):ν) : ν→ν”.
  • So – now, we know that the parameter “x” of the first lambda must be “ν”; and “y” must be “ν→ν”; so by rule 4, we know that the type of the application expression “y x” must be “ν”; and then by rule 3, the lambda has type: “ν→(ν→ν)→ν”.
  • So, for this one, both α and β end up being “ν”, the type of natural numbers.

So, now we have a simply typed λ-calculus. The reason that it’s simply typed is because the type treatment here is minimal: the only way of building new types is through the unavoidable \rightarrow constructor. Other typed lambda calculi include the ability to define parametric types, which are types expressed as functions ranging over types.

Programs are Proofs

Now we can get to the fun part. The mantra of type theory is the program is the proof. Here’s where we get our first glimpse of just what that really means!

Think about the types in the simple typed language calculus. Anything which can be formed from the following grammar is a λ-calculus type:

  • type ::= primitive | function | ( type )
  • primitive ::= α | β | δ | …
  • function ::= type→type

The catch with that grammar is that you can create type expressions which, while they are valid type definitions, you can’t write a single, complete, closed expression which will actually have that type. (A closed expression is one with no free variables.) When there is an expression that has a type, we say that the expression inhabits the type; and that the type is an inhabited type. If there is no expression that can inhabit a type, we say it’s uninhabitable. Any expression which either can’t be typed using the inference rules, or which is typed to an uninhabitable type is a type error.

So what’s the difference between inhabitable type, and an uninhabitable type?

The answer comes from something called the Curry-Howard isomorphism. For a typed λ-calculus, there is a corresponding intuitionistic logic. A type expression is inhabitable if and only if the type is a provable theorem in the corresponding logic.

The type inference rules in λ-calculus are, in fact, the same as logical inference rules in intuitionistic logic. A type \alpha \rightarrow \beta can be seen both as a statement that this is a function that maps from a value of type \alpha to a value of type \beta, and as a logical statement that if we’re given a fact \alpha, we could use that to infer the truth of a fact \beta.

If there’s a logical inference chain from an axiom (a given type assignment) to an inferred statement, then the inferred statement is an inhabitable type. If we have a type \alpha \rightarrow \alpha, then given a inhabited type \alpha, we know that \alpha \rightarrow \alpha is inhabitable, because if \alpha is a fact, then \alpha \rightarrow \alpha is also a fact.

On the other hand, think of a different case \alpha \rightarrow \beta. That’s not a theorem, unless there’s some other context that proves it. As a function type, that’s the type of a function which, without including any context of any kind, can take a parameter of type α, and return a value of a different type β. You can’t do that – there’s got to be some context which provides a value of type β – and to access the context, there’s got to be something to allow the function to access its context: a free variable. Same thing in the logic and the λ-calculus: you need some kind of context to establish “α→β” as a theorem (in the logic) or as an inhabitable type (in the λ-calculus).

What kind of context would make a type \alpha \rightarrow \beta inhabitable? A definition of a valid function that takes an α, and returns a β. If such a function exists, then that function is a proof of the inhabitility of the type. Literally, the program is the proof.

Sean CarrollEffective Field Theory MOOC from MIT

Faithful readers are well aware of the importance of effective field theory in modern physics. EFT provides, in a nutshell, the best way we have to think about the fundamental dynamics of the universe, from the physics underlying everyday life to structure formation in the universe.

And now you can learn about the real thing! MIT is one of the many colleges and universities that is doing a great job putting top-quality lecture courses online, such as the introduction to quantum mechanics I recently mentioned. (See the comments of that post for other goodies.) Now they’ve announced a course at a decidedly non-introductory level: a graduate course in effective field theory, taught by Caltech alumn Iain Stewart. This is the real enchilada, the same stuff a second-year grad student in particle theory at MIT would be struggling with. If you want to learn how to really think about naturalness, or a good way of organizing what we learn from experiments at the LHC, this would be a great place to start. (Assuming you already know the basics of quantum field theory.)

edx-eft

Classes start Sept. 16. I would love to take it myself, but I have other things on my plate at the moment — anyone who does take it, chime in and let us know how it goes.

David Hoggstars in galaxies; data-driven metallicities

MPIA Milky-Way group meeting was excellent today. Ross Fadely talked about our star–galaxy separation projects and showed some very encouraging signs that a supervised regression system to predict stellarity using the object colors might work. We are training not on the labels (star or galaxy) but on the morphological inputs that are used to make those labels. Nicholas Martin (Strasbourg) showed incredible maps of the outskirts of M31 split by metallicity. He is building the maps with a proper forward model of the data. They are gorgeous. He is also experimenting with showing the results of the MCMC sampling as a noisy movie. Cool. Greg Stinson (MPIA) showed both some incredible movies of simulations, and also some interesting results on the delay time distribution for Type Ia SNe. It appears that the direct measurements of the delay time distribution from stellar populations and the inferences of the delay times from chemical abundances might be in conflict. We argued about possible resolutions.

Late in the day, Ness and I closed the loop on the data-driven APOGEE spectral modeling: We used our data-driven model to "learn" the tags (temperature, logg, and metallicity) for a new star. We cheated—we tagged a star that was used in the training—but (a) we rocked it with accurate tag recovery, and (b) we can do cross-validation as a more conservative test. It was an exciting moment.

Jacques Distler Golem V

For nearly 20 years, Golem has been the machine on my desk. It’s been my mail server, web server, file server, … ; it’s run Mathematica and TeX and compiled software for me. Of course, it hasn’t been the same physical machine all these years. Like Doctor Who, it’s gone through several reincarnations.

Alas, word came down from the Provost that all “servers” must move (physically or virtually) to the University Data Center. And, bewilderingly, the machine on my desk counted as a “server.”

Obviously, a 27” iMac wasn’t going to make such a move. And, equally obvious, it would have been rather difficult to replace/migrate all of the stuff I have running on the current Golem. So we had to go out shopping for Golem V. The iMac stayed on my desk; the machine that moved to the Data Center is a new Mac Mini

The new Mac Mini
side view
Golem V, all labeled and ready to go
  • 2.3 GHz quad-core Intel Core i7 (8 logical cores, via hyperthreading)
  • 16 GB RAM
  • 480 GB SSD (main drive)
  • 1 TB HD (Time Machine backup)
  • 1 TB external HD (CCC clone of the main drive)
  • Dual 1 Gigabit Ethernet Adapters, bonded via LACP

In addition to the dual network interface, it (along with, I gather, a rack full of other Mac Minis) is plugged into an ATS, to take advantage of the dual redundant power supply at the Data Center.

Not as convenient, for me, as having it on my desk, but I’m sure the new Golem will enjoy the austere hum of the Data Center much better than the messy cacophony of my office.


I did get a tour of the Data Center out of the deal. Two things stood out for me.

  1. Most UPSs involve large banks of lead-acid batteries. The UPSs at the University Data Center use flywheels. They comprise a long row of refrigerator-sized cabinets which give off a persistent hum due to the humongous flywheels rotating in vacuum within.
  2. The server cabinets are painted the standard generic white. But, for the networking cabinets, the University went to some expense to get them custom-painted … burnt orange.
Custom paint job on the networking cabinets.

August 20, 2014

n-Category Café Holy Crap, Do You Know What A Compact Ring Is?

You know how sometimes someone tells you a theorem, and it’s obviously false, and you reach for one of the many easy counterexamples only to realize that it’s not a counterexample after all, then you reach for another one and another one and find that they fail too, and you begin to concede the possibility that the theorem might not actually be false after all, and you feel your world start to shift on its axis, and you think to yourself: “Why did no one tell me this before?”

That’s what happened to me today, when my PhD student Barry Devlin — who’s currently writing what promises to be a rather nice thesis on codensity monads and topological algebras — showed me this theorem:

Every compact Hausdorff ring is totally disconnected.

I don’t know who it’s due to; Barry found it in the book Profinite Groups by Ribes and Zalesskii. And in fact, there’s also a result for rings analogous to a well-known one for groups: a ring is compact, Hausdorff and totally disconnected if and only if it can be expressed as a limit of finite discrete rings. Every compact Hausdorff ring is therefore “profinite”, that is, expressible as a limit of finite rings.

So the situation for compact rings is completely unlike the situation for compact groups. There are loads of compact groups (the circle, the torus, SO(n)SO(n), U(n)U(n), E 8E_8, …) and there’s a very substantial theory of them, from Haar measure through Lie theory and onwards. But compact rings are relatively few: it’s just the profinite ones.

I only laid eyes on the proof for five seconds, which was just long enough to see that it used Pontryagin duality. But how should I think about this theorem? How can I alter my worldview in such a way that it seems natural or even obvious?

Jordan EllenbergBenson Farb’s ICM talk

One of the things I’ve been spending a lot of time on mathematically is problems around representation stability and “FI-modules,” joint with Tom Church, Benson Farb, and Rohit Nagpal.  Benson just talked about this stuff at the ICM, and here it is:

In the latest stable representation theory news, Andy Putman and (new Wisconsin assistant professor!) Steven Sam have just posted an exciting new preprint about the theory of representations of GL_n(F_p) as n goes to infinity; this is kind of like the linear group version of what FI-modules does for symmetric groups.  (Or, if you like, our thing is their thing over the field with one element….!)  This is something we had hoped to understand but got very confused about, so I’m looking forward to delving into what Andy and Steven did here — expect more blogging!  In particular, they prove the Artinian conjecture of Lionel Schwartz.  Like I said, more on this later.


Doug NatelsonScience and engineering research infrastructure - quo vadis?

I've returned from the NSF's workshop regarding the successor program to the NNIN.  While there, I learned a few interesting things, and I want to point out a serious issue facing science and engineering education and research (at least in the US).
  • The NNIN has been (since 2010) essentially level-funded at $16M/yr for the whole program, and there are no indications that this will change in the foreseeable future.  (Inflation erodes the value of that sum as well over time.)  The NNIN serves approximately 6000 users per year (with turnover of about 2200 users/yr).  For perspective, a truly cutting edge transmission electron microscope, one instrument, costs about $8M.  The idea that the NNIN program can directly create bleeding edge shared research hardware across the nation is misguided.
  • For comparison, the US DOE has five nano centers.  The typical budget for each one is about $20M/yr.  Each nano center can handle around 450 users/yr.  Note that these nano centers are very different things than NNIN sites - they do not charge user fees, and they are co-located with some truly unique characterization facilities (synchrotrons, neutron sources).  Still, the DOE is spending seventeen times as much per user per year in their program as the NNIN.
  • Even the DOE, with their much larger investment, doesn't really know how to handle "recapitalization".  That is, there was money available to buy cutting edge tools to set up their centers initially, but there is no clear, sustainable financial path to be able to replace aging instrumentation.  This is exactly the same problem faced by essentially every research university in the US.  Welcome to the party.  
  • Along those lines:  As far as I can tell (and please correct me if I'm wrong about this!), every US federal granting program intended to have a component associated with increasing shared research infrastructure at universities (this includes the NSF MRI program, MRSEC, STC, ERC, CCI; DOE instrumentation grants, DOE centers like EFRCs, DOD equipment programs like DURIPs) is either level-funded or facing declining funding levels.  Programs like these often favor acquisition of new, unusual tools over standard "bread-and-butter" as well.  Universities are going to have to rely increasingly on internal investment to acquire/replace instrumentation.  Given that there is already considerable resentment/concern about perceived stratification of research universities into "haves" and "have-nots", it's hard to see how this is going to get much better any time soon.
  • To potential donors who are really interested in the problem of graduate (and advanced undergrad) science and engineering hands-on education:  PLEASE consider this situation.  A consortium of donors who raised, say, $300M in an endowment could support the equivalent of the NNIN on the investment returns for decades to come.  This can have an impact on thousands of students/postdocs per year, for years at a time.  The idea that this is something of a return to the medieval system of rich patrons supporting the sciences is distressing.  However, given the constraints of government finances and the enormous sums of money out there in the hands of some brilliant, tech-savvy people who appreciate the importance of an educated workforce, I hope someone will take this possibility seriously.  To put this in further perspective:  I heard on the radio yesterday that the college athletics complex being built at Texas A&M University costs $400M.  Think about that.  A university athletic booster organization was able to raise that kind of money for something as narrowly focused (sorry, Aggies, but you know it's true). 

John PreskillThe singularity is not near: the human brain as a Boson sampler?

Ever since the movie Transcendence came out, it seems like the idea of the ‘technological singularity‘ has been in the air. Maybe it’s because I run in an unorthodox circle of deep thinkers, but over the past couple months, I’ve been roped into three conversations related to this topic. The conversations usually end with some version of “ah shucks, machine learning is developing at a fast rate, so we are all doomed. And have you seen those deep learning videos? Computers are learning to play 35 year old video games?! Put this on an exponential trend and we are D00M3d!”

Computers are now learning the rules of this game and then playing it optimally. Are we all doomed?

Computers are now learning the rules of this game, from visual input only, and then playing it optimally. Are we all doomed?

So what is the technological singularity? My personal translation is: are we on the verge of narcissistic flesh-eating robots stealing our lunch money while we commute to the ‘special school for slow sapiens’?

This is an especially hyperbolic view, and I want to be clear to distinguish ‘machine learning‘ from ‘artificial consciousness.’ The former seems poised for explosive growth but the latter seems to require breakthroughs in our understanding of the fundamental science. The two concepts are often equated when defining the singularity, or even artificial intelligence, but I think it’s important to distinguish these two concepts. Without distinguishing them, people sometimes make the faulty association: machine_learning_progress=>AI_progress=>artificial_consciousness_progress.

I’m generally an optimistic person, but on this topic, I’m especially optimistic about humanity’s status as machine overlords for at least the next ~100 years. Why am I so optimistic? Quantum information (QI) theory has a secret weapon. And that secret weapon is obviously Scott Aaronson (and his brilliant friends+colleagues+sidekicks; especially Alex Arkhipov in this case.) Over the past few years they have done absolutely stunning work related to understanding the computational complexity of linear optics. They colloquially call this work Boson sampling.

What I’m about to say is probably extremely obvious to most people in the QI community, but I’ve had conversations with exquisitely well educated people–including a Nobel Laureate–and very few people outside of QI seem to be aware of Aaronson and Arkhipov’s (AA’s) results. Here’s a thought experiment: does a computer have all the hardware required to simulate the human brain? For a long time, many people thought yes, and they even created a more general hypothesis called the “extended Church-Turring hypothesis.”

An interdisciplinary group of scientists has long speculated that quantum mechanics may stand as an obstruction towards this hypothesis. In particular, it’s believed that quantum computers would be able to efficiently solve some problems that are hard for a classical computer. These results led people, possibly Roger Penrose most notably, to speculate that consciousness may leverage these quantum effects. However, for many years, there was a huge gap between quantum experiments and the biology of the human brain. If I ever broached this topic at a dinner party, my biologist friends would retort: “but the brain is warm and wet, good luck managing decoherence.” And this seems to be a valid argument against the brain as a universal quantum computer. However, one of AA’s many breakthroughs is that they paved the way towards showing that a rather elementary physical system can gain speed-ups on certain classes of problems over classical computers. Maybe the human brain has a Boson sampling module?

More specifically, AA’s physical setup involves being able to: generate identical photons; send them through a network of beamsplitters, phase shifters and mirrors; and then count the number of photons in each mode through ‘nonadaptive’ measurements. This setup computes the permanent of a matrix, which is known to be a hard problem classically. AA showed that if there exists a polynomial-time classical algorithm which samples from the same probability distribution, then the polynomial hierarchy would collapse to the third level (this last statement would be very bad for theoretical computer science and therefore for humans; ergo probably not true.) I should also mention that when I learned the details of these results, during Scott’s lectures this past January at the Israeli Insitute of Advanced Studies’ Winter School in Theoretical Physics, that there was one step in the proof which was not rigorous. Namely, they rely on a conjecture in random matrix theory–but at least they have simulations indicating the conjecture should be true.

Nitty gritty details aside, I find the possibility that this simple system is gaining a classical speed-up compelling in the conversation about consciousness. Especially considering that finding permanents is actually useful for some combinatorics problems. When you combine this with Nature’s mischievous manner of finding ways to use the tools available to it, it seems plausible to me that the brain is using something like Boson sampling for at least one non-trivial task towards consciousness. If not Boson sampling, then maybe ‘Fermion smashing’ or ‘minimal surface finding’ or some other crackpottery words I’m coming up with on the fly. The point is, this result opens a can of worms.

AA’s results have bred new life into my optimism towards humanity’s ability to rule the lands and interwebs for at least the next few decades. Or until some brilliant computer scientist proves that human consciousness is in P. If nothing else, it’s a fun topic for wild dinner party speculation.


August 19, 2014

Jonathan ShockA foray into a new number system - an introduction to imaginary numbers


This is the introduction which I give to my first year mathematics class when they see imaginary numbers for the first time. I thought I'd type it up here as it's received good reactions the two times that I've introduced it in this format. Note that this probably isn't the canonical way to introduce complex numbers, but then most of my lectures don't necessarily take the normal route...

Complex Numbers, a philosophical detour

Before we get on to talking about imaginary numbers and complex numbers, let's try and break down our preconceptions about numbers in general.

We look at the world around us and see many things which we categorise. We see a computer, a piece of paper, we see other people, we see our hands. These are labels that we use to categorise the world, but these objects seem very physical and very real. We rarely question their existence, though if one wants to take the Cartesian view, we should also question the reality we are in. We are not going to go that far, but let's try and ask about the existence of numbers.

I have definitely seen five pieces of paper, but I have never seen a five. I've seen the number written down, but I can write down anything I want and it doesn't necessarily mean that it exists. I can write down a erga[oeiave21 but that doesn't suddenly bring a erga[oeiave21 into existence. How about a -5? I've definitely never seen a -5 though I understand perfectly well what it means. The integers seem to be very good ways of describing, or more specifically counting objects and the negative numbers are a good way of keeping track the transfer of objects from one place to another. I can also ask you to give me 3 coffees, and here I am really asking you to apply 3 as an operation to the object coffee. 3 is acting almost more like a verb than it is a noun. When I describe that there are 30 people in a class, I am really thinking about this as a description, or an adjective. So in the real world, somehow numbers feel like verbs and adjectives. I certainly wouldn't say that 'heavy' exists, but certainly a book which has been described as heavy does.

However, there is a world in which numbers really do seem to be more like nouns than they do in the world around us, and that is in the abstract world of mathematics. In the universe of equations, numbers somehow feel much more concrete and I can manipulate them and transmogrify them from one form to another using a set of mathematical operations which become more and more finely tuned and specialised as we learn more and more mathematics. I can take a 5 and I can apply a  the sin function to it to give me another number. I think of this rather as taking an object and putting it in a machine which turns it into another object. Here 5 is very real, but so is -5 and so is pi/4 and so are all the numbers that we've ever used. They are simply the objects which are manipulated by our mathematical machinery. Whether or not they exist as objects around us isn't very important for our use of them in the mathematical universe.

Incidentally, I have here separated the real universe from the more abstract, platonic, mathematical one, but it is fair to say that we have found mathematics as the best language with which to accurately describe the real universe. All of our models and precise descriptions of the universe are built using mathematics, and it acts as an incredible way of describing the laws of nature. Which came first, the mathematics or the universe? That is not a question I am going to get onto here, but it's certainly a profound one!

A foray into a new number system

OK, so we have a mathematical world of numbers and we can manipulate them. Thus, we should be perfectly happy to have some more ingredients in that world, that don't have such an obvious mapping to the things in the world around us. We will discover that actually they help us enormously in the things that we can do with the mathematical machinery. It's like having a powerful car but not the right fuel to really take it up to top speed. We are about to find out what that fuel is and push the limits of what our car can do!

Previously, if we set up a certain type of quadratic equation and plugged it into our machinery to find a solution, the machinery would jam and we wouldn't get an answer out. This was a real shame because it didn't seem to be that much more of a complicated equation than any other that we had studied. We are perfectly happy with solving an equation like:

x^2-1=0

You can plot the graph of y=x^2-1 and see that it equals 0 at two points x=1 and x=-1. That's fine, our mathematical machinery can deal with that fine, but when we ask to solve something so similar:

x^2+1=0

our traditional machinery comes juddering to a halt and we get an error message on the screen - you're not allowed to take the square root of a negative number, says our program. In fact when we plot the graph of y=x^2+1 it's clear that it doesn't cross the x axis, so it can't have a solution...can it? Maybe we're not looking hard enough. Maybe our machinery is fine, but we've just fed it the wrong fuel. In fact, we can find the solution just fine. The solutions are:

x=√-1 and -√-1

You might look at this and go "Absolutely not!" You can't take the square root of a negative number, but if you plug that into the equation, it works just fine and is a perfectly good solution. What is not true is that √-1 is like the normal numbers that we are used to using. In fact, let's give this solution a name. We'll call it i:

√-1=i

(Note that we are actually being mathematically sloppy here, but for a first pass, this will do - we can explain the subtleties later - in particular the domain of the square root is only the positive real numbers and thus we have to say what we mean by this function separately to deal with non integer powers on negative numbers.)

What is i? It's one of the solutions to the equation above. Plug it in, you'll see that it works. There's no funny business going on here. So what if it doesn't correspond to an object in the world you see around you, nor does -76 but we don't have a problem with using that number, do we? i stands for imaginary. So we call i the imaginary number, but in fact it is no more imaginary than most other numbers, it's just a little harder to understand it because we are used to things which represent a size. Numbers which are not imaginary are called real, but again, this name is probably not a very good name as -32 is in some sense no more "real" than is i.

Once you have defined this new type of number - a number that squares to a negative number, we open up so many new possibilities. Things that previously would have driven our machinery to a halt are now very easily accessible. With this new number we've just upgraded our mathematical machinery so that it can handle so many more problems than it could before.

It might seem that i wouldn't have anything to do with the real world and it's true that any measurement of the real world will give us one of the real numbers that we are used to, but using i makes many things much more natural than they would be without it. This is true in many many fields of science and so having i at hand is absolutely indispensable when you want to describe the real world. Very often given a question about the real world, it is much easier to take a detour into the world of these imaginary numbers which gives us a shortcut to an answer, than taking the long route using only the real numbers.

What we've shown here is that we can deal with √-1 but we can quite happily extend this to the square root of any negative number. From now on √-b where b is a positive number can simply be written as i√b. So i can be multiplied by real numbers. We can take 2 lots of i and 3 lots of and add them together to get 5 lots of i: 2i+3i=5i.

How about adding a real number onto this? Well it turns out that adding together 3+4i can't be simplified any more. You can't add apples and oranges and get pomegranates. What you end up with are a few apples and a few oranges, and that's as simple as it gets. In fact any number of this form, which has a real part, and an imaginary part is known as complex, and these complex numbers form a whole new domain which will supercharge our mathematical toolkit. We will see the power it gives us in the coming lectures...

BackreactionDAMA annual modulation explained without invoking dark matter

Annual modulation of DAMA data.
Image credits: DAMA Collaboration.
Physicists have plenty evidence for the existence of dark matter, matter much like the one we are made of but that does not emit any light. However, so far all this evidence comes from the gravitational pull of dark matter, which affects the motion of stars, the formation of structures, and acts as a gravitational lens to bend light, all of which has been observed. We still do not know however what the microscopic nature of dark matter is. What is the type of particle (particles?) that it is constituted of, and what are its interactions?

Few physicists today doubt that dark matter exists and is some type of particle which has just evaded detection so far. First, there is all the evidence for its gravitational interaction. Add to this that we don’t know any good reason why all matter should couple to photons, and on this ground we can actually expect the existence of dark matter. Moreover, we have various candidate theories for physics beyond the standard model that contain particles which fulfil the necessary properties for dark matter. Finally, alternative explanations, by modifying gravity rather than adding a new type of matter, are disfavored by the existing data.

Not so surprisingly thus, dark matter has come to dominate the search for physics beyond the standard model. We seem to be so very close!

Infuriatingly though, despite many experimental efforts, we still have no evidence for the interaction of dark matter particles, neither among each other nor with the matter that we are made of. Many experiments are searching for evidence of these interactions. It is the very nature of dark matter – it interacting so weakly with our normal matter and with itself – which makes finding evidence so difficult.

One observation being looked for is decay products of dark matter interactions in astrophysical processes. There are presently several observations, such as the Fermi γ-ray excess or the positron excess, whose astrophysical origin is not presently understood and so could be due to dark matter. But astrophysics combines a lot of processes at many energy and density scales, and it is hard to exclude that some signal was not caused by particles of the standard model alone.

Another type of evidence that is being sought after comes from experiments designed to be sensitive to the very rare interaction of dark matter with our normal matter when it passes through the planet. These experiments have the advantage that they happen in a known and controlled environment (as opposed to somewhere in the center of our galaxy). They experiments are typically located deep underground in old mines to filter out unwanted types of particles, collectively referred to as “background”. Whether or not an experiment can detect dark matter interactions within a certain amount of time depends on the density and coupling strength of dark matter, and so also on the type of detector material.

So far, none of the dark matter searches has resulted in a statistically significant positive signal. They have set constraints on the coupling and density of dark matter. Valuable, yes, but frustrating nevertheless.

One experiment that has instilled both hope as well as controversy among physicists is the DAMA experiment. The DAMA experiment sees an unexplained annual modulation in the event rate at high statistical significance. If the signal was caused by dark matter, we would expect an annual modulation due to our celestial motion around the Sun. The event rate depends on the orientation of the detector relative to our motion and should peak around June 2nd, consistent with the DAMA data.

There are of course other signals that have an annual modulation that cause reactions with the material in and around the detector. Notably there is the flux of muons which are produced when cosmic rays hit the upper atmosphere. The muon flux however depends on the temperature in the atmosphere and peaks approximately 30 days too late to explain the observations. The DAMA collaboration has taken into account all other kinds of backgrounds that they could think of, or that other physicists could think of, but dark matter remained the best way to explain the data.

The DAMA experiment has received much attention not primarily because of the presence of the signal, but because of the physicists’ failure to explain the signal with anything but dark matter. It adds to the controversy though that the DAMA signal, if due to dark matter, seems to lie in a parameter range already excluded by other dark matter searches. Then again, this may be due to differences in the detectors. The issue has been discussed back and forth for about a decade now.

All this may change now that Jonathan Davis from the University of Durham, UK, in a recent paper demonstrated that the DAMA signal can be fitted by combining the atmospheric muon flux with the flux of solar neutrinos:
    Fitting the annual modulation in DAMA with neutrons from muons and neutrinos
    Jonathan H. Davis
    arxiv:1407.1052
The neutrinos interact with the rock surrounding the detector, thereby creating secondary particles which contribute to the background. The strength of the neutrino signal depends on the Earth’s distance to the sun and peaks around January 2nd. In his paper, Davis demonstrates that for certain values of the amount of muons and neutrinos these two modulations combine to fit the DAMA data very well, as good as a dark matter explanation. And that is after he has corrected the goodness of the fit by taking into account the larger number of parameters.

Moreover, Davis discusses how the two possible explanations could be distinguished from each other, for example by analyzing the data for residual changes in the solar activity that should not be present if the signal was due to dark matter.

Tim Tait, Professor for theoretical particle physicist at the University of California, Irvine, commented that “[This] may be the first self-consistent explanation for DAMA.” Though of course one has to be cautious not to jump to conclusions since Davis’ argument is partly based on estimates for the reaction rate of neutrinos with the rock that has to be confirmed with more qualitative studies. Thomas Dent, a former particle cosmologist now working in gravitational wave data analysis, welcomed Davis’ explanation: “DAMA has been a distraction to theorists for too long.”

This post first appeared July 17, 2014, on Starts With A BANG with the title "How the experiment that claimed to detect dark matter fooled itself".

August 18, 2014

Quantum DiariesDark Energy Survey kicks off second season cataloging the wonders of deep space

This Fermilab press release came out on Aug. 18, 2014.

This image of the NGC 1398 galaxy was taken with the Dark Energy Camera. This galaxy lives in the Fornax cluster, roughly 65 million light-years from Earth. It is 135,000 light-years in diameter, just slightly larger than our own Milky Way galaxy, and contains more than 100 billion stars. Credit: Dark Energy Survey

This image of the NGC 1398 galaxy was taken with the Dark Energy Camera. This galaxy lives in the Fornax cluster, roughly 65 million light-years from Earth. It is 135,000 light-years in diameter, just slightly larger than our own Milky Way galaxy, and contains more than 100 billion stars. Credit: Dark Energy Survey

On Aug. 15, with its successful first season behind it, the Dark Energy Survey (DES) collaboration began its second year of mapping the southern sky in unprecedented detail. Using the Dark Energy Camera, a 570-megapixel imaging device built by the collaboration and mounted on the Victor M. Blanco Telescope in Chile, the survey’s five-year mission is to unravel the fundamental mystery of dark energy and its impact on our universe.

Along the way, the survey will take some of the most breathtaking pictures of the cosmos ever captured. The survey team has announced two ways the public can see the images from the first year.

Today, the Dark Energy Survey relaunched Dark Energy Detectives, its successful photo blog. Once every two weeks during the survey’s second season, a new image or video will be posted to www.darkenergydetectives.org, with an explanation provided by a scientist. During its first year, Dark Energy Detectives drew thousands of readers and followers, including more than 46,000 followers on its Tumblr site.

Starting on Sept. 1, the one-year anniversary of the start of the survey, the data collected by DES in its first season will become freely available to researchers worldwide. The data will be hosted by the National Optical Astronomy Observatory. The Blanco Telescope is hosted at the National Science Foundation’s Cerro Tololo Inter-American Observatory, the southern branch of NOAO.

In addition, the hundreds of thousands of individual images of the sky taken during the first season are being analyzed by thousands of computers at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, Fermi National Accelerator Laboratory (Fermilab), and Lawrence Berkeley National Laboratory. The processed data will also be released in coming months.

Scientists on the survey will use these images to unravel the secrets of dark energy, the mysterious substance that makes up 70 percent of the mass and energy of the universe. Scientists have theorized that dark energy works in opposition to gravity and is responsible for the accelerating expansion of the universe.

“The first season was a resounding success, and we’ve already captured reams of data that will improve our understanding of the cosmos,” said DES Director Josh Frieman of the U.S. Department of Energy’s Fermi National Accelerator Laboratory and the University of Chicago. “We’re very excited to get the second season under way and continue to probe the mystery of dark energy.”

While results on the survey’s probe of dark energy are still more than a year away, a number of scientific results have already been published based on data collected with the Dark Energy Camera.

The first scientific paper based on Dark Energy Survey data was published in May by a team led by Ohio State University’s Peter Melchior. Using data that the survey team acquired while putting the Dark Energy Camera through its paces, they used a technique called gravitational lensing to determine the masses of clusters of galaxies.

In June, Dark Energy Survey researchers from the University of Portsmouth and their colleagues discovered a rare superluminous supernova in a galaxy 7.8 billion light years away. A group of students from the University of Michigan discovered five new objects in the Kuiper Belt, a region in the outer reaches of our solar system, including one that takes over a thousand years to orbit the Sun.

In February, Dark Energy Survey scientists used the camera to track a potentially hazardous asteroid that approached Earth. The data was used to show that the newly discovered Apollo-class asteroid 2014 BE63 would pose no risk.

Several more results are expected in the coming months, said Gary Bernstein of the University of Pennsylvania, project scientist for the Dark Energy Survey.

The Dark Energy Camera was built and tested at Fermilab. The camera can see light from more than 100,000 galaxies up to 8 billion light-years away in each crystal-clear digital snapshot.

“The Dark Energy Camera has proven to be a tremendous tool, not only for the Dark Energy Survey, but also for other important observations conducted year-round,” said Tom Diehl of Fermilab, operations scientist for the Dark Energy Survey. “The data collected during the survey’s first year — and its next four — will greatly improve our understanding of the way our universe works.”

The Dark Energy Survey Collaboration comprises more than 300 researchers from 25 institutions in six countries. For more information, visit http://www.darkenergysurvey.org.

Fermilab is America’s premier national laboratory for particle physics and accelerator research. A U.S. Department of Energy Office of Science laboratory, Fermilab is located near Chicago, Illinois, and operated under contract by the Fermi Research Alliance, LLC. Visit Fermilab’s website at www.fnal.gov and follow us on Twitter at @FermilabToday.

The DOE Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

The National Optical Astronomy Observatory (NOAO) is operated by the Association of Universities for Research in Astronomy (AURA), Inc., under cooperative agreement with the National Science Foundation.

Andrew Jaffe“Public Service Review”?

A few months ago, I received a call from someone at the “Public Service Review”, supposedly a glossy magazine distributed to UK policymakers and influencers of various stripes. The gentleman on the line said that he was looking for someone to write an article for his magazine giving an example of what sort of space-related research was going on at a prominent UK institution, to appear opposite an opinion piece written by Martin Rees, president of the Royal Society.

This seemed harmless enough, although it wasn’t completely clear what I (or the Physics Department, or Imperial College) would get out of it. But I figured I could probably knock something out fairly quickly. However, he told me there was a catch: it would cost me £6000 to publish the article. And he had just ducked out of his editorial meeting in order to find someone to agree to writing the article that very afternoon. Needless to say, in this economic climate, I didn’t have an account with an unused £6000 in it, especially for something of dubious benefit. (On the other hand, astrophysicists regularly publish in journals with substantial page charges.) It occurred to me that this could be a scam, although the website itself seems legitimate (although no one I spoke to knew anything about it).

I had completely forgotten about this until this week, when another colleague in our group at Imperial told me had received the same phone call, from the same organization, with the same details: article to appear opposite Lord Rees’; short deadline; large fee.

So, this is beginning to sound fishy. Has anyone else had any similar dealings with this organization?

Update: It has come to my attention that one of the comments below was made under a false name, in particular the name of someone who actually works for the publication in question, so I have removed the name, and will possibly likely the comment unless the original write comes forward with more and truthful information (which I will not publish without permission). I have also been informed of the possibility that some other of the comments below may come from direct competitors of the publication. These, too, may be removed in the absence of further confirming information.

Update II: In the further interest of hearing both sides of the discussion, I would like to point out the two comments from staff at the organization giving further information as well as explicit testimonials in their favor.