Planet Musings

October 17, 2019

Doug NatelsonMore items of interest

This continues to be a very very busy time, but here are a few interesting things to read:

Mark Chu-CarrollDashi-Braised Brisket with Onions

I’m a nice jewish boy, so I grew up eating a lot of brisket. Brisket’s an interesting piece of meat. By almost any reasonable standard, it’s an absolutely godawful cut of beef. It’s ridiculously tough. We’re not talking just a little bit chewy here: you can cook a hunk of brisket for four hours, still have something that’s inedible, because your teeth can’t break it down. It’s got a huge layer of fat on top – but the meat itself is completely lean – so if you cook it long enough to be chewable, it can be dry as a bone.

But my ancestors were peasants. They couldn’t afford to eat beef normally, and when special occasions rolled around, the only beef they could afford was the stuff that no one else wanted. So they got briskets.

If you get interested in foods, though, you learn that many of the best foods in the world started off with some poor peasant who wanted to make something delicious, but couldn’t afford expensive ingredients! Brisket is a perfect example. Cook it for a good long time, or in a pressure cooker, with lots of liquid, and lots of seasoning, and it’s one of the most flavorful pieces of the entire animal. Brisket is really delicious, once you manage to break down the structure that makes it so tough. These days, it’s become super trendy, and everyone loves brisket!

Anyway, like I said, I grew up eating jewish brisket. But then I married a Chinese woman, and in our family, we always try to blend traditions as much as we can. In particular, because we’re both food people, I’m constantly trying to take things from my tradition, and blend some of her tradition into it. So I wanted to find a way of blending some chinese flavors into my brisket. What I wound up with is more japanese than chinese, but it works. The smoky flavor of the dashi is perfect for the sweet meatiness of the brisket, and the onions slowly cook and sweeten, and you end up with something that is distinctly similar to the traditional jewish onion-braised-brisket, but also very distinctly different.

Ingredients:

  1. 1 brisket.
  2. 4 large onions.
  3. 4 packets of shredded bonito flakes from an asian market.
  4. 4 large squares of konbu (japanese dried kelp)
  5. 1 cup soy sauce.
  6. 1 cup apple cider.
  7. Random root vegetables that you like. I tend to go with carrots and daikon radish, cut into 1 inch chunks.

Instructions

  1. First, make some dashi:
    1. Put about 2 quarts of water into a pot on your stove, and bring to a boil.
    2. Lower to a simmer, and then add the konbu, and simmer for 30 minutes.
    3. Turn off the heat, add the bonito, and then let it sit for 10 minutes.
    4. Strain out all of the kelp and bonito, and you’ve got dashi!.
  2. Slice all of the onions into strips.
  3. Cut the brisket into sections that will fit into an instant pot or other pressure cooker.
  4. Fill the instant pot by laying a layer of onions, followed by a piece of brisket, followed by a layer of onions until all of the meat is covered in onions.
  5. Take your dashi, add the apple cider, and add soy sauce until it tastes too salty. That’s just right (Remember, your brisket is completely unsalted!) Pour it over the brisket and onions.
  6. Fill in any gaps around the brisket and onions with your root vegetables.
  7. Cook in the instant pot for one hour, and then let it slowly depressurize.
  8. Meanwhile, preheat your oven to 275.
  9. Transfer the brisket from the instant pot to a large casserole or dutch oven. Cover with the onions. Taste the sauce – it should be quite a bit less salty. If it isn’t salty enough, add a bit more sauce sauce; if it tastes sour, add a bit more apply cider.
  10. Cook in the oven for about 1 hour, until the top has browned; then turn the brisket over, and let it cook for another hour until the other side is brown.
  11. Slice into thick slices. (It should be falling apart, so that you can’t cut it thin!).
  12. Strain the fat off of the broth, and cook with a bit of cornstarch to thicken into a gravy.
  13. Eat.

October 16, 2019

Jordan EllenbergThe quarter-circle game

Start at a lattice point inside the quarter-circle x^2 + y^2 < 10^6 in the positive quadrant. You and your opponent take turns: the allowable moves are to go up, right, or both at once (i.e. add (0,1), add (1,0), or add (1,1).) First person to leave the quarter-circle wins. What does it look like if you color a starting point black for “first-player loss” and yellow for “first-player win”? It looks like this:

I like the weird zones of apparent order here. Of course you can do this for any planar domain, any finite set of moves, etc. Are games like this analyzable at all?

I guess you could go a little further and compute the nimber or Grundy value associated to each starting position. You get:

What to make of this?

Here’s some hacky code, it’s simple.

M = 1000
def Crossed(a,b):
    return (a**2 + b**2 >= M*M)

def Mex(L):
    return min([i for i in range(5) if not (i in L)])


L = np.zeros((M+2,M+2))
for a in reversed(range(M+2)):
    for b in reversed(range(M+2)):
        if Crossed(a,b):
            L[a,b] = 0
        else:
            L[a,b] = Mex([L[a+1,b],L[a,b+1],L[a+1,b+1]])

plt.imshow(L,interpolation='none',origin='lower')
plt.show()

One natural question: what proportion of positions inside the quarter-circle are first-player wins? Heuristically: if you imagine the value of positions as Bernoulli variables with parameter p, the value at my current position is 0 if and only if all three of the moves available to me have value 1. So you might expect (1-p) = p^3. This has a root at about 0.68. It does look to me like the proportion of winning positions is converging, but it seems to be converging to something closer to 0.71. Why?

By the way, the game is still interesting (but I’ll bet more directly analyzable) even if the only moves are “go up one” and “go right one”! Here’s the plot of winning and losing values in that case:

n-Category Café Diversity Workshop at UCR

We’re having a workshop to promote diversity in math here at UCR:

Riverside Mathematics Workshop for Excellence and Diversity, Friday 8 November 2019, U. C. Riverside. Organized by John Baez, Weitao Chen, Edray Goins, Ami Radunskaya, and Fred Wilhelm.

It’s happening right before the applied category theory meeting, so I hope some of you can make both… especially since Eugenia Cheng will be giving a talk on Friday!

Three talks will take place in Skye Hall—home of the math department—starting at 1 pm. After this we’ll have refreshments and an hour for students to talk to the speakers. Starting at 6 pm there will be a reception across the road at the UCR Alumni Center, with food and a panel discussion on the challenges we face in promoting diversity at U.C. Riverside.

Details follow.

All the talks will be in Skye 282:

• 1:00–1:50 p.m. Abba Gumel, Arizona State University.

Some models for enhancing diversity and capacity-building in STEM education in under-represented minority communities.

STEM (science, technology, engineering and mathematics) education is undoubtedly the necessary bedrock for the development and sustenance of the vitally-needed knowledge-based economy that fuels and sustains the development of modern nations. Central to STEM education are, of course, the mathematical science … which are the rock-solid foundation of all the natural and engineering sciences. Hence, it is vital that all diverse populations are not left behind in the quest to build and sustain capacity in the mathematical sciences. This talk focuses on discussion around a number of pedagogic and mentorship models that have been (and are being) used to help increase diversity and capacity-building in STEM education in general, and in the mathematical sciences in particular, in under-represented minority populations. Some examples from Africa, Canada and the U.S. will be presented.

• 2:00–2:50. Marissa Loving, Georgia Tech.

Where do I belong? Creating space in the math community.

I will tell the story of my mathematical journey with a focus on my time in grad school. I will be blunt about the ups and downs I have experienced and touch on some of the barriers (both structural and internalized) I have encountered. I will also discuss some of the programs and spaces I have helped create in my quest to make the mathematics community into a place where folks from historically under-represented groups (particularly women of color) can feel safe, seen, and free to devote their energy to their work. If you have ever felt like you don’t belong or worried that you have made others feel that way, this talk is for you.

• 3:00–3:50 p.m. Eugenia Cheng, School of the Art Institute of Chicago.

Inclusion–exclusion in mathematics and beyond: who stays in, who falls out, why it happens, and what we could do about it.

The question of why women and minorities are under-represented in mathematics is complex and there are no simple answers, only many contributing factors. I will focus on character traits, and argue that if we focus on this rather than gender we can have a more productive and less divisive conversation. To try and focus on characters rather than genders I will introduce gender-neutral character adjectives “ingressive” and “congressive” as a new dimension to shift our focus away from masculine and feminine. I will share my experience of teaching congressive abstract mathematics to art students, in a congressive way, and the possible effects this could have for everyone in mathematics, not just women. Moreover I will show that abstract mathematics is applicable to working towards a more inclusive, congressive society in this politically divisive era. This goes against the assumption that abstract math can only be taught to high level undergraduates and graduate students, and the accusation that it is removed from real life.

• 4:00–4:30 p.m. Refreshments in Skye 284.

• 4:30–5:30 p.m. Conversations Between Speakers & Students, Not Faculty, in Skye 284.

• 6:00–6:45 p.m. Reception with Food at the Alumni Center.

• 6:45 - 7:45 p.m. Panel Discussion at Alumni Center with Alissa Crans, Jose Gonzalez and Paige Helms, moderated by Edray Goins.

John BaezDiversity Workshop at UCR

We’re having a workshop to promote diversity in math here at UCR:

Riverside Mathematics Workshop for Excellence and Diversity, Friday 8 November 2019, U. C. Riverside. Organized by John Baez, Weitao Chen, Edray Goins, Ami Radunskaya, and Fred Wilhelm.

It’s happening right before the applied category theory meeting, so I hope some of you can make both… especially since the category theorist Eugenia Cheng will be giving a talk!

Three talks will take place in Skye Hall—home of the math department—starting at 1 pm. After this we’ll have refreshments and an hour for students to talk to the speakers. Starting at 6 pm there will be a reception across the road at the UCR Alumni Center, with food and a panel discussion on the challenges we face in promoting diversity at U.C. Riverside.

All the talks will be in Skye 282:

• 1:00–1:50 p.m. Abba Gumel, Arizona State University.

Some models for enhancing diversity and capacity-building in STEM education in under-represented minority communities.

STEM (science, technology, engineering and mathematics) education is undoubtedly the necessary bedrock for the development and sustenance of the vitally-needed knowledge-based economy that fuels and sustains the development of modern nations. Central to STEM education are, of course, the mathematical science … which are the rock-solid foundation of all the natural and engineering sciences. Hence, it is vital that all diverse populations are not left behind in the quest to build and sustain capacity in the mathematical sciences. This talk focuses on discussion around a number of pedagogic and mentorship models that have been (and are being) used to help increase diversity and capacity-building in STEM education in general, and in the mathematical sciences in particular, in under-represented minority populations. Some examples from Africa, Canada and the U.S. will be presented.

• 2:00–2:50. Marissa Loving, Georgia Tech.

Where do I belong? Creating space in the math community.

I will tell the story of my mathematical journey with a focus on my time in grad school. I will be blunt about the ups and downs I have experienced and touch on some of the barriers (both structural and internalized) I have encountered. I will also discuss some of the programs and spaces I have helped create in my quest to make the mathematics community into a place where folks from historically under-represented groups (particularly women of color) can feel safe, seen, and free to devote their energy to their work. If you have ever felt like you don’t belong or worried that you have made others feel that way, this talk is for you.

• 3:00–3:50 p.m. Eugenia Cheng, School of the Art Institute of Chicago.

Inclusion–exclusion in mathematics and beyond: who stays in, who falls out, why it happens, and what we could do about it.

The question of why women and minorities are under-represented in mathematics is complex and there are no simple answers, only many contributing factors. I will focus on character traits, and argue that if we focus on this rather than gender we can have a more productive and less divisive conversation. To try and focus on characters rather than genders I will introduce gender-neutral character adjectives “ingressive” and “congressive” as a new dimension to shift our focus away from masculine and feminine. I will share my experience of teaching congressive abstract mathematics to art students, in a congressive way, and the possible effects this could have for everyone in mathematics, not just women. Moreover I will show that abstract mathematics is applicable to working towards a more inclusive, congressive society in this politically divisive era. This goes against the assumption that abstract math can only be taught to high level undergraduates and graduate students, and the accusation that it is removed from real life.

• 4:00–4:30 p.m. Refreshments in Skye 284.

• 4:30–5:30 p.m. Conversations Between Speakers & Students, Not Faculty, in Skye 284.

• 6:00–6:45 p.m. Reception with Food at the Alumni Center.

• 6:45 – 7:45 p.m. Panel Discussion at Alumni Center with Alissa Crans, Jose Gonzalez and Paige Helms, moderated by Edray Goins.

October 14, 2019

David Hogggot it!

Adrian Price-Whelan (Flatiron) and I spent time this past week trying to factorize products of Gaussians into new products of different Gaussians. The context is Bayesian inference, where you can factor the joint probability of the data and your parameters into a likelihood times a prior or else into an evidence (what we here call the FML) times a posterior. The factorization was causing us pain this week, but I finally got it this weekend, in the woods. The trick I used (since I didn't want to expand out enormous quadratics) was to use a determinant theorem to get part of the way, and some particularly informative terms in the quadratic expansion to get the rest of the way. Paper (or note or something) forthcoming...

David Hoggmitigating p-modes in EPRV

Megan Bedell (Flatiron) and I continued our work from earlier this week on making a mechanical model of stellar asteroseismic p-modes as damped harmonic oscillators driven by white noise. Because the model is so close to closed-form (it is closed form between kicks, and the kicks are regular and of random amplitude), the code is extremely fast. In a couple minutes we can simulate a realistic, multi-year, dense, space-based observing campaign with a full forest of asteroseismic modes.

The first thing we did with our model is check the results of the recent paper on p-mode mitigation by Chaplin et al, which suggest that you can obtain mitigation of p-mode noise in precision radial-velocity observation campaigns by good choice of exposure time. We expected, at the outset, that the results of this paper are too optimistic: We expected that a fixed exposure time would not do a good job all the time, given the stochastic nature of the driving of the modes, and that there are many modes in a frequency window around the strongest modes. But we were wrong and the Chaplin et al paper is correct! Which is good.

However, we believe that we can do better than exposure-time-tuning for p-mode mitigation. We believe that we can fit the p-modes with the (possibly non-stationary) integral of a stationary Gaussian process, tuned to the spectrum. That's our next job.

John PreskillSultana: The Girl Who Refused To Stop Learning

Sultana at Caltech, Pasadena, CA

Caltech attracts some truly unique individuals from all across the globe with a passion for figuring things out. But there was one young woman on campus this past summer whose journey towards scientific research was uniquely inspiring.

Sultana spent the summer at Caltech in the SURF program, working on next generation quantum error correction codes under the supervision of Dr. John Preskill. As she wrapped up her summer project, returning to her “normal” undergraduate education in Arizona, I had the honor of helping her document her remarkable journey. This is her story:

Afghanistan

My name is Sultana. I was born in Afghanistan. For years I was discouraged and outright prevented from going to school by the war. It was not safe for me because of the active war and violence in the region, even including suicide bombings. Society was still recovering from the decades long civil war, the persistent influence of a dethroned, theocratically regressive regime and the current non-functioning government. These forces combined to make for a very insecure environment for a woman. It was tacitly accepted that the only place safe for a woman was to remain at home and stay quiet. Another consequence of these circumstances was that the teachers at local schools were all male and encouraged the girls to not come to school and study. What was the point if at the end of the day a woman’s destiny was to stay at home and cook? 

For years, I would be up every day at 8am and every waking hour was devoted to housework and preparing the house to host guests, typically older women and my grandmother’s friends. I was destined to be a homemaker and mother. My life had no meaning outside of those roles.

My brothers would come home from school, excited about mathematics and other subjects. For them, it seemed like life was full of infinite possibilities. Meanwhile I had been confined to be behind the insurmountable walls of my family’s compound. All the possibilities for my life had been collapsed, limited to a single identity and purpose.

At fourteen I had had enough. I needed to find a way out of the mindless routine and depressing destiny. And more specifically, I wanted to understand how complex, and clearly powerful, human social systems, such as politics, economics and culture, combined to create overtly negative outcomes like imbalance and oppression. I made the decision to wake up two hours early every day to learn English, before taking on the day’s expected duties.

My grandfather had a saying, “If you know English, then you don’t have to worry about where the food is going to come from.”

He taught himself English and eventually became a professor of literature and humanities. He had even encouraged his five daughters to pursue advanced education. My aunts became medical doctors and chemists (one an engineer, another a teacher). My mother became a lecturer at a university, a profession she would be forced to leave when the Mujaheddin came to power.

I started by studying newspapers and any book I could get my hands on. My hunger for knowledge proved insatiable.

When my father got the internet, the floodgates of information opened. I found and took online courses through sites like Khan Academy and, later, Coursera.

I was intrigued by discussions between my brothers on mathematics. Countless pages of equations and calculations could propagate from a single, simple question; just like how a complex and towering tree can emerge from a single seed.

Khan Academy provided a superbly structured approach to learning mathematics from scratch. Most importantly, mathematics did not rely on a mastery of English as a prerequisite.

Over the next few years I consumed lesson after lesson, expanding my coursework into physics. I would supplement this unorthodox yet structured education with a more self-directed investigation into philosophy through books like Kant’s Critique of Pure Reason. While math and physics helped me develop confidence and ability, ultimately, I was still driven by trying to understand the complexities of human behavior and social systems.

Sultana & EmilyEmily from Iowa

To further develop my hold on English I enrolled in a Skype-based student exchange program and made a critical friend in Emily from Iowa. After only a few conversations, Emily suggested that my English was so good that I should consider taking the SAT and start applying for schools. She soon became a kind of college counselor for me.

Even though my education was stonewalled by an increasingly repressive socio-political establishment, I had the full support of my family. There were no SAT testing locations in Afghanistan. So when it was clear to my family I had the potential to get a college education, my uncle took me across the border into Pakistan, to take the SAT. However, a passport from Afghanistan was required to take the test and, when it was finally granted, it had to be smuggled across the border. Considering that I had no formal education and little time to study for the SAT, I earned a surprisingly competitive score on the exam.

My confidence soared and I convinced my family to make the long trek to the American embassy and apply for a student visa. I was denied in less than sixty seconds! They thought I would end up not studying and becoming an economic burden. I was crushed. And my immaturely formed vision of the world was clearly more idealized than the reality that presented itself and slammed its door in my face. I was even more confused by how the world worked and I immediately became invested in understanding politics.

The New York Times

Emily was constantly working in the background on my behalf, and on the other side of the world, trying to get the word out about my struggle. This became her life’s project, to somehow will me into a position to attend a university. New York Times writer Nicholas Kristoff heard about my story and we conducted an interview over Skype. The story was published in the summer of 2016.

The New York Times opinion piece was published in June. Ironically, I didn’t have much say or influence on the opinion-editorial piece. I felt that the piece was overly provocative.

Even now, because family members still live under the threat of violence, I will not allow myself to be photographed. Suffice to say, I never wanted to stir up trouble, or call attention to myself. Even so, the net results of that article are overwhelmingly positive. I was even offered a scholarship to attend Arizona State University; that was, if I could secure a visa.

I was pessimistic. I had been rejected twice already by what should have been the most logical and straightforward path towards formal education in America. How was this special asylum plea going to result in anything different? But Nicholas Kristoff was absolutely certain I would get it. He gave my case to an immigration lawyer with a relationship to the New York Times. In just a month and a half I was awarded humanitarian parole. This came with some surprising constraints, including having to fly to the U.S. within ten days and a limit of four months to stay there while applying for asylum. As quickly as events were unfolding, I didn’t even hesitate.

As I was approaching America, I realized that over 5,000 miles of water would now separate me from the most influential forces in my life. The last of these flights took me deep into the center of America, about a third of the way around the planet.

The clock was ticking on my time in America – at some point, factors and decisions outside of my control would deign that I was safe to go back to Afghanistan – so I exhausted every opportunity to obtain knowledge while I was isolated from the forces that would keep me from formal education. I petitioned for an earlier than expected winter enrollment at Arizona State University. In the meantime, I continued my self-education through edX classes (coursework from MIT made available online), as well as with Khan Academy and Coursera.

Camelback Mountain overlooking Phoenix, AZ

Phoenix

The answer came back from Arizona State University. They had granted me enrollment for the winter quarter. In December of 2016, I flew to the next state in my journey for intellectual independence and began my first full year of formal education at the largest university in America. Mercifully, my tenure in Phoenix began in the cool winter months. In fact, the climate was very similar to what I knew in Afghanistan.

However, as summer approached, I began to have a much different experience. This was the first time I was living on my own. It took me a while to be accustomed to that. I would generally stay in my room and study, even avoiding classes. The intensifying heat of the Arizona sun ensured that I would stay safely and comfortably encased inside. And I was actually doing okay. At first.

Happy as I was to finally be a part of formal education, it was in direct conflict with the way in which I had trained myself to learn. The rebellious spirit which helped me defy the cultural norms and risk harm to myself and my family, the same fire that I had to continuously stoke for years on my own, also made me rebel against the system that actively wanted me to learn. I constantly felt that I had better approaches to absorb the material and actively ignored the homework assignments. Naturally, my grades suffered and I was forced to make a difficult internal adjustment. I also benefited from advice from Emily, as well as a cousin who was pursuing education in Canada.

As I gritted my teeth and made my best attempts to adopt the relatively rigid structures of formal education, I began to feel more and more isolated. I found myself staying in my room day after day, focused simply on studying. But for what purpose? I was aimless. A machine of insatiable learning, but without any specific direction to guide my curiosity. I did not know it at the time, but I was desperate for something to motivate me.

The ripples from the New York Times piece were still reverberating and Sultana was contacted by author Betsy Devine. Betsy was a writer who had written a couple of books with notable scientists. Betsy was particularly interested in introducing Sultana to her husband, Nobel prize winner in physics, Frank Wilczek.

The first time I met Frank Wilczek was at lunch with with him and his wife. Wilczek enjoys hiking in the mountains overlooking surrounding Phoenix and Betsy suggested that I join Frank on an early morning hike. A hike. With Frank Wilczek. This was someone whose book, A Beautiful Question: Finding Nature’s Deep Design, I had read while in Afghanistan. To say that I was nervous is an understatement, but thankfully we fell into an easy flow of conversation. After going over my background and interests he asked me if I was interested in physics. I told him that I was, but I was principally interested in concepts that could be applied very generally, broadly – so that I could better understand the underpinnings of how society functions.

He told me that I should pursue quantum physics. And more specifically, he got me very excited about the prospects of quantum computers. It felt like I was placed at the start of a whole new journey, but I was walking on clouds. I was filled with a confidence that could only be generated by finding oneself comfortable in casual conversation with a Nobel laureate.

Immediately after the hike I went and collected all of the relevant works Wilczek had suggested, including Dirac’s “The Principles of Quantum Mechanics.”

Reborn

With a new sense of purpose, I immersed myself in the formal coursework, as well as my own, self-directed exploration of quantum physics. My drive was rewarded with all A’s in the fall semester of my sophomore year.

That same winter Nicholas Kristoff had published his annual New York Times opinion review of the previous year titled, “Why 2017 Was the Best Year in Human History.” I was mentioned briefly.

It was the start of the second semester of my sophomore year, and I was starting to feel a desire to explore applied physics. I was enrolled in a graduate-level seminar class in quantum theory that spring. One of the lecturers for the class was a young female professor who was interested in entropy, and more importantly, how we can access seemingly lost information. In other words, she wanted access to the unknown.

To that end, she was interested in gauge/gravity duality models like the one meant to explain the black hole “firewall” paradox, or the Anti-de Sitter space/conformal field theory (AdS/CFT) correspondence that uses a model of the universe where space-time has negative, hyperbolic curvature.

The geometry of 5D space-time in AdS space resembles that of an M.C.Escher drawing, where fish wedge themselves together, end-to-end, tighter and tighter as we move away from the origin. These connections between fish are consistent, radiating in an identical pattern, infinitely approaching the edge.

Unbeknownst to me, a friend of that young professor had read the Times opinion article. The article not only mentioned that I had been teaching myself string theory, but also that I was enrolled at Arizona State University and taking graduate level courses. She asked the young professor if she would be interested in meeting me.

The young professor invited me to her office, she told me about how black holes were basically a massive manifestation of entropy, and the best laboratory by which to learn the true nature of information loss, and how it might be reversed. We discussed the possibility of working on a research paper to help her codify the quantum component for her holographic duality models.

I immediately agreed. If there was anything in physics as difficult as understanding human social, religious and political dynamics, it was probably understanding the fundamental nature of space and time. Because the AdS/CFT model of spacetime was negatively curved, we could employ something called holographic quantum error correction to create a framework by which the information of a bulk entity (like a black hole) can be preserved at its boundary, even with some of its physical components (particles) becoming corrupted, or lost.

I spent the year wrestling with, and developing, quantum error correcting codes for a very specific kind of black hole. I learned that information has a way of protecting itself from decay through correlations. For instance, a single logical quantum bit (or “qubit”) of information can be represented, or preserved, by five stand-in, or physical, qubits. At a black hole’s event horizon, where entangled particles are pulled apart, information loss can be prevented as long as less than three-out-of-five of the representative physical qubits are lost to the black hole interior. The original quantum information can be recalled by using a quantum code to reverse this “error”.

By the end of my sophomore year I was nominated to represent Arizona State University at an inaugural event supporting undergraduate women in science. The purpose of the event was to help prepare promising women in physics for graduate school applications, as well as provide information on life as a graduate student. The event, called FUTURE of Physics, was to be held at Caltech.

I mentioned the nomination to Frank Wilczek and he excitedly told me that I must use the opportunity to meet Dr. John Preskill, who was at the forefront of quantum computing and quantum error correction. He reminded me that the best advice he could give anyone was to “find interesting minds and bother them.”

FUTURE 2018 at Caltech, Pasadena, CA

Pasadena

I spent two exciting days at Caltech with 32 other young women from all over the country on November 1st and 2nd of 2018. I was fortunate to meet John Preskill. And of course I introduced myself like any normal human being would, by asking him about the Shor factoring algorithm. I even got to attend a Wednesday group meeting with all of the current faculty and postdocs at IQIM. When I returned to ASU I sent an email to Dr. Preskill inquiring about potentially joining a short research project with his team.

I was extremely relieved when months later I received a response and an invitation to apply for the Summer Undergraduate Research Fellowship (SURF) at Caltech. Because Dr. Preskill’s recent work has been at the forefront of quantum error correction for quantum computing it was relatively straightforward to come up with a research proposal that aligned with the interests of my research adviser at ASU.

One of the major obstacles to efficient and widespread proliferation of quantum computers is the corruption of qubits, expensively held in very delicate low-energy states, by environmental interference and noise. People simply don’t, and should not, have confidence in practical, everyday use of quantum computers without reliable quantum error correction. The proposal was to create a proof that, if you’re starting with five physical qubits (representing a single logical qubit) and lose two of those qubits due to error, you can work backwards to recreate the original five qubits, and recover the lost logical qubit in the context of holographic error correcting codes. My application was accepted, and I made my way to Pasadena at the beginning of this summer.

The temperate climate, mountains and lush neighborhoods were a welcome change, especially with the onslaught of relentless heat that was about to envelope Phoenix.

Even at a campus as small as Caltech I felt like the smallest, most insignificant fish in a tiny, albeit prestigious, pond. But soon I was being connected to many like-minded, heavily motivated mathematicians and physicists, from all walks of life and from every corner of the Earth. Seasoned, young post-docs, like Grant Salton and Victor Albert introduced me to HaPPY tensors. HaPPY tensors are a holographic tensor network model developed by Dr. Preskill and colleagues meant to represent a toy model of AdS/CFT. Under this highly accessible and world-class mentorship, and with essentially unlimited resources, I wrestled with HaPPY tensors all summer and successfully discovered a decoder that could recover five qubits from three.

Example of tensor network causal and entanglement wedge reconstructions. From a blog post by Beni Yoshida on March 27th, 2015 on Quantum Frontiers.

This was the ultimate confidence booster. All the years of doubting myself and my ability, due to educating myself in a vacuum, lacking the critical feedback provided by real mentors, all disappeared.

Tomorrow

Now returning to ASU to finish my undergraduate education, I find myself still thinking about what’s next. I still have plans to expand my proof, extending beyond five qubits, to a continuous variable representation, and writing a general algorithm for an arbitrary N layer tensor-network construction. My mentors at Caltech have graciously extended their support to this ongoing work. And I now dream to become a professor of physics at an elite institution where I can continue to pursue the answers to life’s most confusing problems.

My days left in America are not up to me. I am applying for permanent amnesty so I can continue to pursue my academic dreams, and to take a crack at some of the most difficult problems facing humanity, like accelerating the progress towards quantum computing. I know I can’t pursue those goals back in Afghanistan. At least, not yet. Back there, women like myself are expected to stay at home, prepare food and clean the house for everybody else.

Little do they know how terrible I am at housework – and how much I love math.

John BaezClimate Technology Primer (Part 2)

Here’s the second of a series of blog articles:

• Adam Marblestone, Climate technology primer (2/3): CO2 removal.

The first covered the basics of climate science as related to global warming. This one moves on to consider technologies for removing carbon dioxide from the air.

I hope you keep the following warning in mind as you read on:

I’m focused here on trying to understand the narrowly technical aspects, not on the political aspects, despite those being crucial. This is meant to be a review of the technical literature, not a political statement. I worried that writing a blog purely on the topic of technological intervention in the climate, without attempting or claiming to do justice to the social issues raised, would implicitly suggest that I advocate a narrowly technocratic or unilateral approach, which is not my intention. By focusing on technology, I don’t mean to detract from the importance of the social and policy aspects.

The technological issues are worth studying on their own, since they constrain what’s possible. For example: to draw down as much CO2 as human civilization is emitting now, with trees their peak growth phase and their carbon stored permanently, could be done by covering the whole USA with such trees.

October 13, 2019

Tommaso DorigoThe Plot Of The Week - Neutrinoless Double Beta Decay At Reach

The most recent preprint in the ArXiv this evening is an APPEC report on the neutrinoless double beta decay. This is the thick result of a survey of the state of the art in the search for a very (very) rare subnuclear process, which can shed light on the nature of the mass hierarchy of neutrinos. Oh, and, APPEC stands for "AstroParticle Physics European Consortium", in case you wondered.

read more

October 12, 2019

David Hoggnothing

Today was almost all admin and teaching. But I did get to the Astronomical Data Group meeting at Flatiron, where we had good discussions of representation learning, light curves generated by spotted stars, the population of planets around slightly evolved stars, and accreted stellar systems in the Milky Way halo!

October 11, 2019

Noncommutative GeometrySir Michael Atiyah, a Knight Mathematician. A Tribute to Michael Atiyah, an Inspiration and a Friend. By Alain Connes and Joseph Kouneiher

Sir Michael Atiyah was considered one of the world’s foremost mathematicians. He is best known for his work in algebraic topology and the codevelopment of a branch of mathematics called topological 𝐾-theory, together with the Atiyah–Singer index theorem, for which he received the Fields Medal (1966). He also received the Abel Prize (2004) along with Isadore M. Singer for their discovery and

Peter Rohde New paper: Photonic quantum error correction of linear optics using W-state encoding

With my PhD student Madhav Krishnan Vijayan, and old PhD colleague Austin Lund.

Full paper available at https://arxiv.org/abs/1910.03093

Abstract

Error-detection and correction are necessary prerequisites for any scalable quantum computing architecture. Given the inevitability of unwanted physical noise in quantum systems and the propensity for errors to spread as computations proceed, computational outcomes can become substantially corrupted. This observation applies regardless of the choice of physical implementation. In the context of photonic quantum information processing, there has recently been much interest in passive linear optics quantum computing, which includes boson-sampling, as this model eliminates the highly-challenging requirements for feed-forward via fast, active control. That is, these systems are passive by definition. In usual scenarios, error detection and correction techniques are inherently active, making them incompatible with this model, arousing suspicion that physical error processes may be an insurmountable obstacle. Here we explore a photonic error-detection technique, based on W-state encoding of photonic qubits, which is entirely passive, based on post-selection, and compatible with these near-term photonic architectures of interest. We show that this W-state redundant encoding techniques enables the suppression of dephasing noise on photonic qubits via simple fan-out style operations, implemented by optical Fourier transform networks, which can be readily realised today. The protocol effectively maps dephasing noise into heralding failures, with zero failure probability in the ideal no-noise limit.

Matt von HippelCongratulations to James Peebles, Michel Mayor, and Didier Queloz!

The 2019 Physics Nobel Prize was announced this week, awarded to James Peebles for work in cosmology and to Michel Mayor and Didier Queloz for the first observation of an exoplanet.

Peebles introduced quantitative methods to cosmology. He figured out how to use the Cosmic Microwave Background (light left over from the Big Bang) to understand how matter is distributed in our universe, including the presence of still-mysterious dark matter and dark energy. Mayor and Queloz were the first team to observe a planet outside of our solar system (an “exoplanet”), in 1995. By careful measurement of the spectrum of light coming from a star they were able to find a slight wobble, caused by a Jupiter-esque planet in orbit around it. Their discovery opened the floodgates of observation. Astronomers found many more planets than expected, showing that, far from a rare occurrence, exoplanets are quite common.

It’s a bit strange that this Nobel was awarded to two very different types of research. This isn’t the first time the prize was divided between two different discoveries, but all of the cases I can remember involve discoveries in closely related topics. This one didn’t, and I’m curious about the Nobel committee’s logic. It might have been that neither discovery “merited a Nobel” on its own, but I don’t think we’re supposed to think of shared Nobels as “lesser” than non-shared ones. It would make sense if the Nobel committee thought they had a lot of important results to “get through” and grouped them together to get through them faster, but if anything I have the impression it’s the opposite: that at least in physics, it’s getting harder and harder to find genuinely important discoveries that haven’t been acknowledged. Overall, this seems like a very weird pairing, and the Nobel committee’s citation “for contributions to our understanding of the evolution of the universe and Earth’s place in the cosmos” is a pretty loose justification.

Scott AaronsonFrom quantum supremacy to classical fallacy

Maybe I should hope that people never learn to distinguish for themselves which claimed breakthroughs in building new forms of computation are obviously serious, and which ones are obviously silly. For as long as they don’t, this blog will always serve at least one purpose. People will cite it, tweet it, invoke its “authority,” even while from my point of view, I’m offering nothing more intellectually special than my toddler does when he calls out “moo-moo cow! baa-baa sheep!” as we pass them on the road.

But that’s too pessimistic. Sure, most readers must more-or-less already know what I’ll say about each thing: that Google’s quantum supremacy claim is serious, that memcomputing to solve NP-complete problems is not, etc. Even so, I’ve heard from many readers that this blog was at least helpful for double-checking their initial impressions, and for making common knowledge what before had merely been known to many. I’m fine for it to continue serving those roles.

Last week, even as I dealt with fallout from Google’s quantum supremacy leak, I also got several people asking me to comment on a Nature paper entitled Integer factorization using stochastic magnetic tunnel junctions (warning: paywalled). See also here for a university press release.

The authors report building a new kind of computer based on asynchronously updated “p-bits” (probabilistic bits). A p-bit is “a robust, classical entity fluctuating in time between 0 and 1, which interacts with other p-bits … using principles inspired by neural networks.” They build a device with 8 p-bits, and use it to factor integers up to 945. They present this as another “unconventional computation scheme” alongside quantum computing, and as a “potentially scalable hardware approach to the difficult problems of optimization and sampling.”

A commentary accompanying the Nature paper goes much further still—claiming that the new factoring approach, “if improved, could threaten data encryption,” and that resources should now be diverted from quantum computing to this promising new idea, one with the advantages of requiring no refrigeration or maintenance of delicate entangled states. (It should’ve added: and how big a number has Shor’s algorithm factored anyway, 21? Compared to 945, that’s peanuts!)

Since I couldn’t figure out a gentler way to say this, here goes: it’s astounding that this paper and commentary made it into Nature in the form that they did. Juxtaposing Google’s sampling achievement with p-bits, as several of my Facebook friends did last week, is juxtaposing the Wright brothers with some guy bouncing around on a pogo stick.

If you were looking forward to watching me dismantle the p-bit claims, I’m afraid you might be disappointed: the task is over almost the moment it begins. “p-bit” devices can’t scalably outperform classical computers, for the simple reason that they are classical computers. A little unusual in their architecture, but still well-covered by the classical Extended Church-Turing Thesis. Just like with the quantum adiabatic algorithm, an energy penalty is applied to coax the p-bits into running a local optimization algorithm: that is, making random local moves that preferentially decrease the number of violated constraints. Except here, because the whole evolution is classical, there doesn’t seem to be even the pretense that anything is happening that a laptop with a random-number generator couldn’t straightforwardly simulate. In terms of this editorial, if adiabatic quantum computing is Richard Nixon—hiding its lack of observed speedups behind subtle arguments about tunneling and spectral gaps—then p-bit computing is Trump.

Even so, I wouldn’t be writing this post if you opened the paper and it immediately said, in effect, “look, we know. You’re thinking that this is just yet another stochastic local optimization method, which could clearly be simulated efficiently on a conventional computer, thereby putting it into a different conceptual universe from quantum computing. You’re thinking that factoring an n-bit integer will self-evidently take exp(n) time by this method, as compared to exp(n1/3) for the Number Field Sieve, and that no crypto is in even remote danger from this. But here’s why you should still be interested in our p-bit model: because of other advantages X, Y, and Z.” Alas, in vain one searches the whole paper, and the lengthy supplementary material, and the commentary, for any acknowledgment of the pachyderm in the pagoda. Not an asymptotic runtime scaling in sight. Quantum computing is there, but stripped of the theoretical framework that gives it its purpose.

That silence, in the pages of Naturethat’s the part that convinced me that, while on the negative side this blog seems to have accomplished nothing for the world in 14 years of existence, on the positive side it will likely have a role for decades to come.

Update: See a response in the comments, which I appreciated, from Kerem Cansari (one of the authors of the paper), and my response to the response.

(Partly) Unrelated Announcement #1: My new postdoc, Andrea Rocchetto, had the neat idea of compiling a Quantum Computing Fact Sheet: a quick “Cliffs Notes” for journalists, policymakers, and others looking to get the basics right. The fact sheet might grow in the future, but in the meantime, check it out! Or at a more popular level, try the Quantum Atlas made by folks at the University of Maryland.

Unrelated Announcement #2: Daniel Wichs asked me to give a shout-out to a new Conference on Information-Theoretic Cryptography, to be held June 17-19 in Boston.

Third Announcement: Several friends asked me to share that Prof. Peter Wittek, quantum computing researcher at the University of Toronto, has gone missing in the Himalayas. Needless to say we hope for his safe return.

Scott AaronsonBook Review: ‘The AI Does Not Hate You’ by Tom Chivers

A couple weeks ago I read The AI Does Not Hate You: Superintelligence, Rationality, and the Race to Save the World, the first-ever book-length examination of the modern rationalist community, by British journalist Tom Chivers. I was planning to review it here, before it got preempted by the news of quantum supremacy (and subsequent news of classical non-supremacy). Now I can get back to rationalists.

Briefly, I think the book is a triumph. It’s based around in-person conversations with many of the notable figures in and around the rationalist community, in its Bay Area epicenter and beyond (although apparently Eliezer Yudkowsky only agreed to answer technical questions by Skype), together of course with the voluminous material available online. There’s a good deal about the 1990s origins of the community that I hadn’t previously known.

The title is taken from Eliezer’s aphorism, “The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else.” In other words: as soon as anyone succeeds in building a superhuman AI, if we don’t take extreme care that the AI’s values are “aligned” with human ones, the AI might be expected to obliterate humans almost instantly as a byproduct of pursuing whatever it does value, more-or-less as we humans did with woolly mammoths, moas, and now gorillas, rhinos, and thousands of other species.

Much of the book relates Chivers’s personal quest to figure out how seriously he should take this scenario. Are the rationalists just an unusually nerdy doomsday cult? Is there some non-negligible chance that they’re actually right about the AI thing? If so, how much more time do we have—and is there even anything meaningful that can be done today? Do the dramatic advances in machine learning over the past decade change the outlook? Should Chivers be worried about his own two children? How does this risk compare to the more “prosaic” civilizational risks, like climate change or nuclear war? I suspect that Chivers’s exploration will be most interesting to readers who, like me, regard the answers to none of these questions as obvious.

While it sounds extremely basic, what makes The AI Does Not Hate You so valuable to my mind is that, as far as I know, it’s nearly the only examination of the rationalists ever written by an outsider that tries to assess the ideas on a scale from true to false, rather than from quirky to offensive. Chivers’s own training in academic philosophy seems to have been crucial here. He’s not put off by people who act weirdly around him, even needlessly cold or aloof, nor by utilitarian thought experiments involving death or torture or weighing the value of human lives. He just cares, relentlessly, about the ideas—and about remaining a basically grounded and decent person while engaging them. Most strikingly, Chivers clearly feels a need—anachronistic though it seems in 2019—actually to understand complicated arguments, be able to repeat them back correctly, before he attacks them.

Indeed, far from failing to understand the rationalists, it occurs to me that the central criticism of Chivers’s book is likely to be just the opposite: he understands the rationalists so well, extends them so much sympathy, and ends up endorsing so many aspects of their worldview, that he must simply be a closet rationalist himself, and therefore can’t write about them with any pretense of journalistic or anthropological detachment. For my part, I’d say: it’s true that The AI Does Not Hate You is what you get if you treat rationalists as extremely smart (if unusual) people from whom you might learn something of consequence, rather than as monkeys in a zoo. On the other hand, Chivers does perform the journalist’s task of constantly challenging the rationalists he meets, often with points that (if upheld) would be fatal to their worldview. One of the rationalists’ best features—and this precisely matches my own experience—is that, far from clamming up or storming off when faced with such challenges (“lo! the visitor is not one of us!”), the rationalists positively relish them.

It occurred to me the other day that we’ll never know how the rationalists’ ideas would’ve developed, had they continued to do so in a cultural background like that of the late 20th century. As Chivers points out, the rationalists today are effectively caught in the crossfire of a much larger cultural war—between, to their right, the recrudescent know-nothing authoritarians, and to their left, what one could variously describe as woke culture, call-out culture, or sneer culture. On its face, it might seem laughable to conflate the rationalists with today’s resurgent fascists: many rationalists are driven by their utilitarianism to advocate open borders and massive aid to the Third World; the rationalist community is about as welcoming of alternative genders and sexualities as it’s humanly possible to be; and leading rationalists like Scott Alexander and Eliezer Yudkowsky strongly condemned Trump for the obvious reasons.

Chivers, however, explains how the problem started. On rationalist Internet forums, many misogynists and white nationalists and so forth encountered nerds willing to debate their ideas politely, rather than immediately banning them as more mainstream venues would. As a result, many of those forces of darkness (and they probably don’t mind being called that) predictably congregated on the rationalist forums, and their stench predictably wore off on the rationalists themselves. Furthermore, this isn’t an easy-to-fix problem, because debating ideas on their merits, extending charity to ideological opponents, etc. is sort of the rationalists’ entire shtick, whereas denouncing and no-platforming anyone who can be connected to an ideological enemy (in the modern parlance, “punching Nazis”) is the entire shtick of those condemning the rationalists.

Compounding the problem is that, as anyone who’s ever hung out with STEM nerds might’ve guessed, the rationalist community tends to skew WASP, Asian, or Jewish, non-impoverished, and male. Worse yet, while many rationalists live their lives in progressive enclaves and strongly support progressive values, they’ll also undergo extreme anguish if they feel forced to subordinate truth to those values.

Chivers writes that all of these issues “blew up in spectacular style at the end of 2014,” right here on this blog. Oh, what the hell, I’ll just quote him:

Scott Aaronson is, I think it’s fair to say, a member of the Rationalist community. He’s a prominent theoretical computer scientist at the University of Texas at Austin, and writes a very interesting, maths-heavy blog called Shtetl-Optimised.

People in the comments under his blog were discussing feminism and sexual harassment. And Aaronson, in a comment in which he described himself as a fan of Andrea Dworkin, described having been terrified of speaking to women as a teenager and young man. This fear was, he said, partly that of being thought of as a sexual abuser or creep if any woman ever became aware that he sexually desired them, a fear that he picked up from sexual-harassment-prevention workshops at his university and from reading feminist literature. This fear became so overwhelming, he said in the comment that came to be known as Comment #171, that he had ‘constant suicidal thoughts’ and at one point ‘actually begged a psychiatrist to prescribe drugs that would chemically castrate me (I had researched which ones), because a life of mathematical asceticism was the only future that I could imagine for myself.’ So when he read feminist articles talking about the ‘male privilege’ of nerds like him, he didn’t recognise the description, and so felt himself able to declare himself ‘only’ 97 per cent on board with the programme of feminism.

It struck me as a thoughtful and rather sweet remark, in the midst of a long and courteous discussion with a female commenter. But it got picked up, weirdly, by some feminist bloggers, including one who described it as ‘a yalp of entitlement combined with an aggressive unwillingness to accept that women are human beings just like men’ and that Aaronson was complaining that ‘having to explain my suffering to women when they should already be there, mopping my brow and offering me beers and blow jobs, is so tiresome.’

Scott Alexander (not Scott Aaronson) then wrote a furious 10,000-word defence of his friend… (p. 214-215)

And then Chivers goes on to explain Scott Alexander’s central thesis, in Untitled, that privilege is not a one-dimensional axis, so that (to take one example) society can make many women in STEM miserable while also making shy male nerds miserable in different ways.

For nerds, perhaps an alternative title for Chivers’s book could be “The Normal People Do Not Hate You (Not All of Them, Anyway).” It’s as though Chivers is demonstrating, through understated example, that taking delight in nerds’ suffering, wanting them to be miserable and alone, mocking their weird ideas, is not simply the default, well-adjusted human reaction, with any other reaction being ‘creepy’ and ‘problematic.’ Some might even go so far as to apply the latter adjectives to the sneerers’ attitude, the one that dresses up schoolyard bullying in a social-justice wig.

Reading Chivers’s book prompted me to reflect on my own relationship to the rationalist community. For years, I interacted often with the community—I’ve known Robin Hanson since ~2004 and Eliezer Yudkowsky since ~2006, and our blogs bounced off each other—but I never considered myself a member.  I never ranked paperclip-maximizing AIs among humanity’s more urgent threats—indeed, I saw them as a distraction from an all-too-likely climate catastrophe that will leave its survivors lucky to have stone tools, let alone AIs. I was also repelled by what I saw as the rationalists’ cultier aspects.  I even once toyed with the idea of changing the name of this blog to “More Wrong” or “Wallowing in Bias,” as a play on the rationalists’ LessWrong and OvercomingBias.

But I’ve drawn much closer to the community over the last few years, because of a combination of factors:

  1. The comment-171 affair. This was not the sort of thing that could provide any new information about the likelihood of a dangerous AI being built, but was (to put it mildly) the sort of thing that can tell you who your friends are. I learned that empathy works a lot like intelligence, in that those who boast of it most loudly are often the ones who lack it.
  2. The astounding progress in deep learning and reinforcement learning and GANs, which caused me (like everyone else, perhaps) to update in the direction of human-level AI in our lifetimes being an actual live possibility,
  3. The rise of Scott Alexander. To the charge that the rationalists are a cult, there’s now the reply that Scott, with his constant equivocations and doubts, his deep dives into data, his clarity and self-deprecating humor, is perhaps the least culty cult leader in human history. Likewise, to the charge that the rationalists are basement-dwelling kibitzers who accomplish nothing of note in the real world, there’s now the reply that Scott has attracted a huge mainstream following (Steven Pinker, Paul Graham, presidential candidate Andrew Yang…), purely by offering up what’s self-evidently some of the best writing of our time.
  4. Research. The AI-risk folks started publishing some research papers that I found interesting—some with relatively approachable problems that I could see myself trying to think about if quantum computing ever got boring. This shift seems to have happened at roughly around the same time my former student, Paul Christiano, “defected” from quantum computing to AI-risk research.

Anyway, if you’ve spent years steeped in the rationalist blogosphere, read Eliezer’s “Sequences,” and so on, The AI Does Not Hate You will probably have little that’s new, although it might still be interesting to revisit ideas and episodes that you know through a newcomer’s eyes. To anyone else … well, reading the book would be a lot faster than spending all those years reading blogs! I’ve heard of some rationalists now giving out copies of the book to their relatives, by way of explaining how they’ve chosen to spend their lives.

I still don’t know whether there’s a risk worth worrying about that a misaligned AI will threaten human civilization in my lifetime, or my children’s lifetimes, or even 500 years—or whether everyone will look back and laugh at how silly some people once were to think that (except, silly in which way?). But I do feel fairly confident that The AI Does Not Hate You will make a positive difference—possibly for the world, but at any rate for a little well-meaning community of sneered-at nerds obsessed with the future and with following ideas wherever they lead.

David Hoggrotating stars, a mechanical model for an asterosesmic mode

Our weekly Stars and Exoplanets Meeting at Flatiron was all about stellar rotation somehow this week (no we don't plan this!). Adrian Price-Whelan (Flatiron) showed that stellar rotations can get so large in young clusters that stars move off the main sequence and the main sequence can even look double. We learned (or I learned) that a significant fraction of young stars are born spinning very close to break-up. This I immediately thought was obviously wrong and then very quickly decided was obvious: It is likely if the last stages of stellar growth are from accretion. Funny how an astronomer can turn on a dime.

And in that same meeting, Jason Curtis (Columbia) brought us up to date on his work on on stellar rotation and its use as a stellar clock. He showed that the usefulness is great (by comparing clusters of different ages); it looks incredible for at least the first Gyr or so of a stars lifetime. But the usefulness decreases at low masses (cool temperatures). Or maybe not, but the physics looks very different.

In the morning, before the meeting, Megan Bedell (Flatiron) and I built a mechanical model of an asteroseismic mode by literally making a code that produces a damped, driven harmonic oscillator, driven by random delta-function kicks. That was fun! And it seems to work.

David Hoggimage denoising; the space of natural images

I got in a bit of research in a mostly-teaching day. I saw the CDS Math-and-Data seminar, which was by Peyman Milanfar (Google) about de-noising models. In particular, he was talking about some of the theory and ideas behind the de-noising that Google uses in its Pixel cameras and related technology. They use methods that are adaptive to the image itself but which don't explicitly learn a library of image priors or patch priors or anything like that from data. (But they do train the models on human reactions to the denoising.)

Milanfar's theoretical results were nice. For example: De-noising is like a gradient step in response to a loss function! That's either obvious or deep. I'll go with deep. And good images (non-noisy natural images) should be fixed points of the de-noising projection (which is in general non-linear). Their methods identify similar parts of the images and use commonality of those parts to inform the nonlinear projections. But he explained all this with very simple notation, which was nice.

After the talk I had a quick conversation with Jonathan Niles-Weed (NYU) about the geometry of the space of natural images. Here's a great argument he gave: Imagine you have two arbitrarily different images, like one of the Death Star (tm) and one of the inside of the seminar room. Are these images connected to one another in the natural-image subspace of image space? That is, is there a continuous transformation from one to the other, every point along which is itself a good natural image?

Well, if I can imagine a continuous tracking shot (movie) of me walking out of the seminar room and into a spaceship and then out of the airlock on a space walk to repair the Death Star (tm), and if every frame in that movie is a good natural image, and everything is continuous, then yes! What a crazy argument. The space of all natural images might be one continuously connected blob. Crazy! I love the way mathematicians think.

October 10, 2019

n-Category Café Foundations of Math and Physics One Century After Hilbert

I wrote a review of this book with chapters by Penrose, Witten, Connes, Atiyah, Smolin and others:

It gave me a chance to say a bit—just a tiny bit—about the current state of fundamental physics and the foundations of mathematics.

John BaezFoundations of Math and Physics One Century After Hilbert

I wrote a review of this book with chapters by Penrose, Witten, Connes, Atiyah, Smolin and others:

• John Baez, review of Foundations of Mathematics and Physics One Century After Hilbert: New Perspectives, edited by Joseph Kouneiher, Notices of the American Mathematical Society 66 no. 11 (November 2019), 1690–1692.

It gave me a chance to say a bit—just a tiny bit—about the current state of fundamental physics and the foundations of mathematics.

October 08, 2019

Doug Natelson"Phase of matter" is a pretty amazing emergent concept

As we await the announcement of this year's physics Nobel tomorrow morning (last chance for predictions in the comments), a brief note:

I think it's worth taking a moment to appreciate just how amazing it is that matter has distinct thermodynamic phases or states.

We teach elementary school kids that there are solids, liquids, and gases, and those are easy to identify because they have manifestly different properties.  Once we know more about microscopic details that are hard to see with unaided senses, we realize that there are many more macroscopic states - different structural arrangements of solids; liquid crystals; magnetic states; charge ordered states; etc.

When we take statistical physics, we learn descriptively what happens.  When you get a large number of particles (say atoms for now) together, the macroscopic state that they take on in thermal equilibrium is the one that corresponds to the largest number of microscopic arrangements of the constituents under the given conditions.  So, the air in my office is a gas because, at 298 K and 101 kPa, there are many many more microscopic arrangements of the molecules with that temperature and pressure that look like a gas than there are microscopic arrangements of the molecules that correspond to a puddle of N2/O2 mixture on the floor. 

Still, there is something special going on.  It's not obvious that there should have to be distinct phases at all, and such a small number of them.  There is real universality about solids - their rigidity, resistance to shear, high packing density of atoms - independent of details.  Likewise, liquids with their flow under shear, comparative incompressibility, and general lack of spatial structure.  Yes, there are detailed differences, but any kid can recognize that water, oil, and lava all have some shared "liquidity".  Why does matter end up in those configurations, and not end up being a homogeneous mush over huge ranges of pressure and temperature?  This is called emergence, because while it's technically true that the standard model of particle physics undergirds all of this, it is not obvious in the slightest how to deduce the properties of snowflakes, raindrops, or water vapor from there.    Like much of condensed matter physics, this stuff is remarkable (when you think about it), but so ubiquitous that it slides past everyone's notice pretty much of the time.

October 07, 2019

John BaezQuantales from Petri Nets

A referee pointed out this paper to me:

• Uffe Engberg and Glynn Winskel, Petri nets as models of linear logic, in Colloquium on Trees in Algebra and Programming, Springer, Berlin, 1990, pp. 147–161.

It contains a nice observation: we can get a commutative quantale from any Petri net.

I’ll explain how in a minute. But first, what does have to do with linear logic?

In linear logic, propositions form a category where the morphisms are proofs and we have two kinds of ‘and’: \& , which is a cartesian product on this category, and \otimes, which is a symmetric monoidal structure. There’s much more to linear logic than this (since there are other connectives), and maybe also less (since we may want our category to be a mere poset), but never mind. I want to focus on the weird business of having two kinds of ‘and’.

Since \& is cartesian we have P \Rightarrow P \& P as usual in logic.

But since \otimes is not cartesian we usually don’t have P \Rightarrow P \otimes P. This other kind of ‘and’ is about resources: from one copy of a thing P you can’t get two copies.

Here’s one way to think about it: if P is “I have a sandwich”, P \& P is like “I have a sandwich and I have a sandwich”, while P \otimes P is like “I have two sandwiches”.

A commutative quantale captures these two forms of ‘and’, and more. A commutative quantale is a commutative monoid object in the category of cocomplete posets: that is, posets where every subset has a least upper bound. But it’s a fact that any cocomplete poset is also complete: every subset has a greatest lower bound!

If we think of the elements of our commutative quantale as propositions, we interpret x \le y as “x implies y”. The least upper bound of any subset of proposition is their ‘or’. Their greatest lower bound is their ‘and’. But we also have the commutative monoid operation, which we call \otimes. This operation distributes over least upper bounds.

So, a commutative quantale has both the logical \& (not just for pairs of propositions, but arbitrary sets of them) and the \otimes operation that describes combining resources.

To get from a Petri net to a commutative quantale, we can compose three functors.

First, any Petri net gives a commutative monoidal category—that is, a commutative monoid object in \mathsf{Cat}. Indeed, my student Jade has analyzed this in detail and shown the resulting functor from the category of Petri nets to the category of commutative monoidal categories is a left adjoint:

• Jade Master, Generalized Petri nets, Section 4.

Second, any category gives a poset where we say x \le y if there is a morphism from x to y. Moreover, the resulting functor \mathsf{Cat} \to \mathsf{Poset} preserves products. As a result, every commutative monoidal category gives a commutative monoidal poset: that is, a commutative monoid object in the category of Posets.

Composing these two functors, every Petri net gives a commutative monoidal poset. Elements are of this poset are markings of the Petri net, the partial order is “reachability”, and the commutative monoid structure is addition markings.

Third, any poset P gives another poset \widehat{P} whose elements are downsets of P: that is, subsets S \subseteq P such that

x \in S, y \le x \; \implies \; y \in S

The partial order on downsets is inclusion. This new poset \widehat{P} is ‘better’ than P because it’s cocomplete. That is, any union of downsets is again a downset. Moreover, \widehat{P} contains P as a sub-poset. The reason is that each x \in P gives a downset

\downarrow x = \{y \in P : \; y \le x \}

and clearly

x \le y \; \iff \;  \downarrow x \subseteq \downarrow y

Composing this third functor with the previous two, every Petri net gives a commutative monoid object in the category of cocomplete posets. But this is just a commutative quantale!

What is this commutative quantale like? Its elements are downsets of markings of our Petri net: sets of markings such that if x is in the set and x is reachable from y then y is also in the set.

It’s good to contemplate this a bit more. A marking can be seen as a ‘resource’. For example, if our Petri net has a place in it called sandwich there is a marking 2sandwich, which means you have two sandwiches. Downsets of markings are sets of markings such that if x is in the set and x is reachable from y then y is also in the set! An example of a downset would be “a sandwich, or anything that can give you a sandwich”. Another is “two sandwiches, or anything that can give you two sandwiches”.

The tensor product \otimes comes from addition of markings, extended in the obvious way to downsets of markings. For example, “a sandwich, or anything that can give you a sandwich” tensored with “a sandwich, or anything that can give you a sandwich” equals “two sandwiches, or anything that can give you two sandwiches”.

On the other hand, the cartesian product \& is the logical ‘and’:
if you have “a sandwich, or anything that can give you a sandwich” and you have “a sandwich, or anything that can give you a sandwich”, then you just have “a sandwich, or anything that can give you a sandwich”.

So that’s the basic idea.

Terence TaoLarge prime gaps and probabilistic models

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes {{\mathcal P} := \{2,3,5,\dots\}}, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If {x} is a large number, then the largest prime gap {G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n} in {[1,x]} is of size {\asymp \log^2 x}. (Granville refines this conjecture to {\gtrsim \xi \log^2 x}, where {\xi := 2e^{-\gamma} = 1.1229\dots}. Here we use the asymptotic notation {X \gtrsim Y} for {X \geq (1-o(1)) Y}, {X \sim Y} for {X \gtrsim Y \gtrsim X}, {X \gg Y} for {X \geq C^{-1} Y}, and {X \asymp Y} for {X \gg Y \gg X}.)

Conjecture 2 (Hardy-Littlewood conjecture) If {\mathcal{H} := \{h_1,\dots,h_k\}} are fixed distinct integers, then the number of numbers {n \in [1,x]} with {n+h_1,\dots,n+h_k} all prime is {({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}} as {x \rightarrow \infty}, where the singular series {{\mathfrak S}(\mathcal{H})} is defined by the formula

\displaystyle {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps {p_{n+1}-p_n} with {n \leq x} were asymptotically distributed according to an exponential distribution of mean {\log x}, in the sense that

\displaystyle | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)

 

as {x \rightarrow \infty} for any fixed {\lambda \geq 0}. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of {\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}} for fixed {k,\lambda}, where {p_n} ranges over the primes in {[1,x]}. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

\displaystyle 1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is even and

\displaystyle 1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is odd, for any natural number {N}; setting {N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|} and taking means, one then gets upper and lower bounds for the probability that the interval {(p_n, p_n + \lambda \log x)} is free of primes. The most difficult step is to control the mean values of the singular series {{\mathfrak S}(\mathcal{H})} as {{\mathcal H}} ranges over {k}-tuples in a fixed interval such as {[0, \lambda \log x]}.

Heuristically, if one extrapolates the asymptotic (1) to the regime {\lambda \asymp \log x}, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below {1} when {\lambda} is significantly larger than {\log x}. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of {\lambda}, which is only enough to establish the far weaker bound {G_{\mathcal P}(x) / \log x \rightarrow \infty}, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on {G_{\mathcal P}(x)}). An inspection of the argument shows that if one wished to extend (1) to parameter choices {\lambda} that were allowed to grow with {x}, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length {k} of the tuple {{\mathcal H} = (h_1,\dots,h_k)}, as well as the magnitudes of the shifts {h_1,\dots,h_k}, were also allowed to grow with {x}. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

  • If the Hardy-Littlewood conjecture is uniformly true for {k}-tuples of length {k \ll \frac{\log x}{\log\log x}}, and with shifts {h_1,\dots,h_k} of size {O( \log^2 x )}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}.
  • If the Hardy-Littlewood conjecture is “true on average” for {k}-tuples of length {k \ll \frac{y}{\log x}} and shifts {h_1,\dots,h_k} of size {y} for all {\log x \leq y \leq \log^2 x \log\log x}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \log^2 x}.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with {k} allowed to grow slowly with {x}. Again, the main difficulty is to accurately estimate average values of the singular series {{\mathfrak S}({\mathfrak H})}. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

\displaystyle V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)

of the singular series, for a suitable cutoff {z}; it turns out that when studying prime tuples of size {t}, the most convenient cutoff {z(t)} is the “Pólya magic cutoff“, defined as the largest prime for which

\displaystyle \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)

 

(this is well defined for {t \geq e^2}); by Mertens’ theorem, we have {z(t) \sim t^{1/e^\gamma}}. One can interpret {V_{\mathcal Z}(z)} probabilistically as

\displaystyle V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )

where {\mathcal{S}_z \subset {\bf Z}} is the randomly sifted set of integers formed by removing one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p \leq z}. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes {{\mathcal P}} behave in some approximate statistical sense like the random sifted set {\mathcal{S}_z}, and one can prove the above theorem by using the Bonferroni inequalities both for the primes {{\mathcal P}} and for the random sifted set, and comparing the two (using an even {K} for the sifted set and an odd {K} for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes {{\mathcal P}} other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of {{\mathcal P}} which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model {{\mathcal C}} of the primes, in which each natural number {n \geq 3} is selected for membership in {{\mathcal C}} with an independent probability of {1/\log n}. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

\displaystyle | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})

and Cramér also showed that the largest gap {G_{\mathcal C}(x)} was almost surely {\sim \log^2 x}. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series {{\mathfrak S}({\mathcal H})} is absent.

Granville proposed a refinement {{\mathcal G}} to Cramér’s random model {{\mathcal C}} in which one first sieves out (in each dyadic interval {[x,2x]}) all residue classes {0 \hbox{ mod } p} for {p \leq A} for a certain threshold {A = \log^{1-o(1)} x = o(\log x)}, and then places each surviving natural number {n} in {{\mathcal G}} with an independent probability {\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap {G_{\mathcal G}(x)} in this model was almost surely {\gtrsim \xi \log^2 x}, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to {10^{18}}, for instance, have shown that {G_{\mathcal P}(x)} never exceeds {0.9206 \log^2 x} in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff {A}. After some experimentation, we were able to produce a tractable random model {{\mathcal R}} for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of {\gtrsim \xi \log^2 x} (we also get an upper bound of {\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}} for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p}, and as before we let {S_z} be the sifted set of integers formed by deleting the residue classes {a_p \hbox{ mod } p} with {p \leq z}. We then set

\displaystyle {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}

with {z(t)} Pólya’s magic cutoff (this is the cutoff that gives {{\mathcal R}} a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

\displaystyle \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)

 

and that the Hardy-Littlewood conjectures hold with power savings for {k} up to {\log^c x} for any fixed {c < 1} and for shifts {h_1,\dots,h_k} of size {O(\log^c x)}. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint {c=1}), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model {{\mathcal R}} (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for {{\mathcal G}_{\mathcal R}(x) } that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo {p} from an interval {[0,y]} for all primes {p} up to a given threshold. The lower bound in (3) relates to the case of deleting the {0 \hbox{ mod } p} residue classes from {[0,y]}; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of {{\mathcal R}} is somewhat more complicated than of {{\mathcal C}} or {{\mathcal G}} as there is now non-trivial coupling between the events {n \in {\mathcal R}} as {n} varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval {[0,y]} and randomly deletes a residue class {a_p \hbox{ mod } p} for each prime {p} up to a given threshold. For very small {p} this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized {p}, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes {p}, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements {n \in [0,y]} to primes {p} if {n} falls in the random residue class {a_p \hbox{ mod } p}) turns out to give the best results.

October 05, 2019

John BaezClimate Technology Primer (Part 1)

Here’s the first of a series of blog articles on how technology can help address climate change:

• Adam Marblestone, Climate technology primer (1/3): basics.

Adam Marblestone is a research scientist at Google DeepMind studying connections between neuroscience and artificial intelligence. Previously, he was Chief Strategy Officer of the brain-computer interface company Kernel, and a research scientist in Ed Boyden’s Synthetic Neurobiology Group at MIT working to develop new technologies for brain circuit mapping. He also helped to start companies like BioBright, and advised foundations such as the Open Philanthropy Project.

Now, like many of us, he’s thinking about climate change, and what to do about it. He writes:

In this first of three posts, I attempt an outsider’s summary of the basic physics/chemistry/biology of the climate system, focused on back of the envelope calculations where possible. At the end, I comment a bit about technological approaches for emissions reductions. Future posts will include a review of the science behind negative emissions technologies, as well as the science (with plenty of caveats, don’t worry) behind more controversial potential solar radiation management approaches. This first post should be very basic for anyone “in the know” about energy, but I wanted to cover the basics before jumping into carbon sequestration technologies.

Check it out! I like the focus on “back of the envelope” calculations because they serve as useful sanity checks for more complicated models… and also provide a useful vaccination against the common denialist argument “all the predictions rely on complicated computer models that could be completely wrong, so why should I believe them?” It’s a sad fact that one of the things we need to do is make sure most technically literate people have a basic understanding of climate science, to help provide ‘herd immunity’ to everyone else.

The ultimate goal here, though, is to think about “what can technology do about climate change?”

October 04, 2019

Matt von HippelCalabi-Yaus in Feynman Diagrams: Harder and Easier Than Expected

I’ve got a new paper up, about the weird geometrical spaces we keep finding in Feynman diagrams.

With Jacob Bourjaily, Andrew McLeod, and Matthias Wilhelm, and most recently Cristian Vergu and Matthias Volk, I’ve been digging up odd mathematics in particle physics calculations. In several calculations, we’ve found that we need a type of space called a Calabi-Yau manifold. These spaces are often studied by string theorists, who hope they represent how “extra” dimensions of space are curled up. String theorists have found an absurdly large number of Calabi-Yau manifolds, so many that some are trying to sift through them with machine learning. We wanted to know if our situation was quite that ridiculous: how many Calabi-Yaus do we really need?

So we started asking around, trying to figure out how to classify our catch of Calabi-Yaus. And mostly, we just got confused.

It turns out there are a lot of different tools out there for understanding Calabi-Yaus, and most of them aren’t all that useful for what we’re doing. We went in circles for a while trying to understand how to desingularize toric varieties, and other things that will sound like gibberish to most of you. In the end, though, we noticed one small thing that made our lives a whole lot simpler.

It turns out that all of the Calabi-Yaus we’ve found are, in some sense, the same. While the details of the physics varies, the overall “space” is the same in each case. It’s a space we kept finding for our “Calabi-Yau bestiary”, but it turns out one of the “traintrack” diagrams we found earlier can be written in the same way. We found another example too, a “wheel” that seems to be the same type of Calabi-Yau.

And that actually has a sensible name

We also found many examples that we don’t understand. Add another rung to our “traintrack” and we suddenly can’t write it in the same space. (Personally, I’m quite confused about this one.) Add another spoke to our wheel and we confuse ourselves in a different way.

So while our calculation turned out simpler than expected, we don’t think this is the full story. Our Calabi-Yaus might live in “the same space”, but there are also physics-related differences between them, and these we still don’t understand.

At some point, our abstract included the phrase “this paper raises more questions than it answers”. It doesn’t say that now, but it’s still true. We wrote this paper because, after getting very confused, we ended up able to say a few new things that hadn’t been said before. But the questions we raise are if anything more important. We want to inspire new interest in this field, toss out new examples, and get people thinking harder about the geometry of Feynman integrals.

October 03, 2019

n-Category Café Edinburgh is Hiring

We’re hiring!

The advertised positions are in “algebra, geometry and topology and related fields”. Category theory is specifically mentioned, as is the importance of glue:

The Hodge Institute is a large world-class group of mathematicians whose research interests lie in Algebra, Geometry and Topology and related fields such as Category Theory and Mathematical Physics. Applicants should demonstrate an outstanding research record and contribute to the productive and strong collaborative research environment in the Hodge Institute. Preference may be given to candidates who strengthen connections between different areas of research within the Hodge Institute or more broadly between the Hodge Institute and other parts of the School.

Here “School” refers to the School of Mathematics, and the Hodge Institute is just our name for the set of people within the School who work on the subjects mentioned.

We’re advertising for Lecturers or Readers. These are both permanent positions. Lecturer is where almost everyone starts, and Reader is more senior (the rough equivalent of a junior full professor in US terminology).

Scott AaronsonOn two blog posts of Jerry Coyne

A few months ago, I got to know Jerry Coyne, the recently-retired biologist at the University of Chicago who writes the blog “Why Evolution Is True.” The interaction started when Jerry put up a bemused post about my thoughts on predictability and free will, and I pointed out that if he wanted to engage me on those topics, there was more to go on than an 8-minute YouTube video. I told Coyne that it would be a shame to get off on the wrong foot with him, since perusal of his blog made it obvious that whatever he and I disputed, it was dwarfed by our areas of agreement. He and I exchanged more emails and had lunch in Chicago.

By way of explaining how he hadn’t read “The Ghost in the Quantum Turing Machine,” Coyne emphasized the difference in my and his turnaround times: while these days I update my blog only a couple times per month, Coyne often updates multiple times per day. Indeed the sheer volume of material he posts, on subjects from biology to culture wars to Chicago hot dogs, would take months to absorb.

Today, though, I want to comment on just two posts of Jerry’s.

The first post, from back in May, concerns David Gelernter, the computer science professor at Yale who was infamously injured in a 1993 attack by the Unabomber, and who’s now mainly known as a right-wing commentator. I don’t know Gelernter, though I did once attend a small interdisciplinary workshop in the south of France that Gelernter also attended, wherein I gave a talk about quantum computing and computational complexity in which Gelernter showed no interest. Anyway, Gelernter, in an essay in May for the Claremont Review of Books, argued that recent work has definitively disproved Darwinism as a mechanism for generating new species, and until something better comes along, Intelligent Design is the best available alternative.

Curiously, I think that Gelernter’s argument falls flat not for detailed reasons of biology, but mostly just because it indulges in bad math and computer science—in fact, in precisely the sorts of arguments that I was trying to answer in my segment on Morgan Freeman’s Through the Wormhole (see also Section 3.2 of Why Philosophers Should Care About Computational Complexity). Gelernter says that

  1. a random change to an amino acid sequence will pretty much always make it worse,
  2. the probability of finding a useful new such sequence by picking one at random is at most ~1 in 1077, and
  3. there have only been maybe ~1040 organisms in earth’s history.

Since 1077 >> 1040, Darwinism is thereby refuted—not in principle, but as an explanation for life on earth. QED.

The most glaring hole in the above argument, it seems to me, is that it simply ignores intermediate possible numbers of mutations. How hard would it be to change, not 1 or 100, but 5 amino acids in a given protein to get a usefully different one—as might happen, for example, with local optimization methods like simulated annealing run at nonzero temperature? And how many chances were there for that kind of mutation in the earth’s history?

Gelernter can’t personally see how a path could cut through the exponentially large solution space in a polynomial amount of time, so he asserts that it’s impossible. Many of the would-be P≠NP provers who email me every week do the same. But this particular kind of “argument from incredulity” has an abysmal track record: it would’ve applied equally well, for example, to problems like maximum matching that turned out to have efficient algorithms. This is why, in CS, we demand better evidence of hardness—like completeness results or black-box lower bounds—neither of which seem however to apply to the case at hand. Surely Gelernter understands all this, but had he not, he could’ve learned it from my lecture at the workshop in France!

Alas, online debate, as it’s wont to do, focused less on Gelernter’s actual arguments and the problems with them, than on the tiresome questions of “standing” and “status.” In particular: does Gelernter’s authority, as a noted computer science professor, somehow lend new weight to Intelligent Design? Or conversely: does the very fact that a computer scientist endorsed ID prove that computer science itself isn’t a real science at all, and that its practitioners should never be taken seriously in any statements about the real world?

It’s hard to say which of these two questions makes me want to bury my face deeper into my hands. Serge Lang, the famous mathematician and textbook author, spent much of his later life fervently denying the connection between HIV and AIDS. Lynn Margulis, the discoverer of the origin of mitochondria (and Carl Sagan’s first wife), died a 9/11 truther. What broader lesson should we draw from any of this? And anyway, what percentage of computer scientists actually do doubt evolution, and how does it compare to the percentage in other academic fields and other professions? Isn’t the question of how divorced we computer scientists are from the real world an … ahem … empirical matter, one hard to answer on the basis of armchair certainties and anecdotes?

Speaking of empiricism, if you check Gelernter’s publication list on DBLP and his Google Scholar page, you’ll find that he did influential work in programming languages, parallel computing, and other areas from 1981 through 1997, and then in the past 22 years published a grand total of … two papers in computer science. One with four coauthors, the other a review/perspective piece about his earlier work. So it seems fair to say that, some time after receiving tenure in a CS department, Gelernter pivoted (to put it mildly) away from CS and toward conservative punditry. His recent offerings, in case you’re curious, include the book America-Lite: How Imperial Academia Dismantled Our Culture (and Ushered In the Obamacrats).

Some will claim that this case underscores what’s wrong with the tenure system itself, while others will reply that it’s precisely what tenure was designed for, even if in this instance you happen to disagree with what Gelernter uses his tenured freedom to say. The point I wanted to make is different, though. It’s that the question “what kind of a field is computer science, anyway, that a guy can do high-level CS research on Monday, and then on Tuesday reject Darwinism and unironically use the word ‘Obamacrat’?”—well, even if I accepted the immense weight this question places on one atypical example (which I don’t), and even if I dismissed the power of compartmentalization (which I again don’t), the question still wouldn’t arise in Gelernter’s case, since getting from “Monday” to “Tuesday” seems to have taken him 15+ years.

Anyway, the second post of Coyne’s that I wanted to talk about is from just yesterday, and is about Jeffrey Epstein—the financier, science philanthropist, and confessed sex offender, whose appalling crimes you’ll have read all about this week if you weren’t on a long sea voyage without Internet or something.

For the benefit of my many fair-minded friends on Twitter, I should clarify that I’ve never met Jeffrey Epstein, let alone accepted any private flights to his sex island or whatever. I doubt he has any clue who I am either—even if he did once claim to be “intrigued” by quantum information.

I do know a few of the scientists who Epstein once hung out with, including Seth Lloyd and Steven Pinker. Pinker, in particular, is now facing vociferous attacks on Twitter, similar in magnitude perhaps to what I faced in the comment-171 affair, for having been photographed next to Epstein at a 2014 luncheon that was hosted by Lawrence Krauss (a physicist who later faced sexual harassment allegations of his own). By the evidentiary standards of social media, this photo suffices to convict Pinker as basically a child molester himself, and is also a devastating refutation of any data that Pinker might have adduced in his books about the Enlightenment’s contributions to human flourishing.

From my standpoint, what’s surprising is not that Pinker is up against this, but that it took this long to happen, given that Pinker’s pro-Enlightenment, anti-blank-slate views have had the effect of painting a giant red target on his back. Despite the near-inevitability, though, you can’t blame Pinker for wanting to defend himself, as I did when it was my turn for the struggle session.

Thus, in response to an emailed inquiry by Jerry Coyne, Pinker shared some detailed reflections about Epstein; Pinker then gave Coyne permission to post those reflections on his blog (though they were originally meant for Coyne only). Like everything Pinker writes, they’re worth reading in full. Here’s the opening paragraph:

The annoying irony is that I could never stand the guy [Epstein], never took research funding from him, and always tried to keep my distance. Friends and colleagues described him to me as a quantitative genius and a scientific sophisticate, and they invited me to salons and coffee klatches at which he held court. But I found him to be a kibitzer and a dilettante — he would abruptly change the subject ADD style, dismiss an observation with an adolescent wisecrack, and privilege his own intuitions over systematic data.

Pinker goes on to discuss his record of celebrating, and extensively documenting, the forces of modernity that led to dramatic reductions in violence against women and that have the power to continue doing so. On Twitter, Pinker had already written: “Needless to say I condemn Epstein’s crimes in the strongest terms.”

I probably should’ve predicted that Pinker would then be attacked again—this time, for having prefaced his condemnation with the phrase “needless to say.” The argument, as best I can follow, runs like this: given all the isms of which woke Twitter has already convicted Pinker—scientism, neoliberalism, biological determinism, etc.—how could Pinker’s being against Epstein’s crimes (which we recently learned probably include the rape, and not only statutorily, of a 15-year-old) possibly be assumed as a given?

For the record, just as Epstein’s friends and enablers weren’t confined to one party or ideology, so the public condemnation of Epstein strikes me as a matter that is (or should be) beyond ideology, with all reasonable dispute now confined to the space between “very bad” and “extremely bad,” between “lock away for years” and “lock away for life.”

While I didn’t need Pinker to tell me that, one reason I personally appreciated his comments is that they helped to answer a question that had bugged me, and that none of the mountains of other condemnations of Epstein had given me a clear sense about. Namely: supposing, hypothetically, that I’d met Epstein around 2002 or so—without, of course, knowing about his crimes—would I have been as taken with him as many other academics seem to have been? (Would you have been? How sure are you?)

Over the last decade, I’ve had the opportunity to meet some titans and semi-titans of finance and business, to discuss quantum computing and other nerdy topics. For a few (by no means all) of these titans, my overriding impression was precisely their unwillingness to concentrate on any one point for more than about 20 seconds—as though they wanted the crust of a deep intellectual exchange without the meat filling. My experience with them fit Pinker’s description of Epstein to a T (though I hasten to add that, as far as I know, none of these others ran teenage sex rings).

Anyway, given all the anger at Pinker for having intersected with Epstein, it’s ironic that I could easily imagine Pinker’s comments rattling Epstein the most of anyone’s, if Epstein hears of them from his prison cell. It’s like: Epstein must have developed a skin like a rhinoceros’s by this point about being called a child abuser, a creep, and a thousand similar (and similarly deserved) epithets. But “a kibitzer and a dilettante” who merely lured famous intellectuals into his living room, with wads of cash not entirely unlike the ones used to lure teenage girls to his massage table? Ouch!

OK, but what about Alan Dershowitz—the man who apparently used to be Epstein’s close friend, who still is Pinker’s friend, and who played a crucial role in securing Epstein’s 2008 plea bargain, the one now condemned as a travesty of justice? I’m not sure how I feel about Dershowitz.  It’s like: I understand that our system requires attorneys willing to mount a vociferous defense even for clients who they privately know or believe to be guilty—and even to get those clients off on technicalities or bargaining whenever they can.  I’m also incredibly grateful that I chose CS rather than law school, because I don’t think I could last an hour advocating causes that I knew to be unjust. Just like my fellow CS professor, the intelligent design advocate David Gelernter, I have the privilege and the burden of speaking only for myself.

Scott AaronsonScott’s Supreme Quantum Supremacy FAQ!

You’ve seen the stories—in the Financial Times, Technology Review, CNET, Facebook, Reddit, Twitter, or elsewhere—saying that a group at Google has now achieved quantum computational supremacy with a 53-qubit superconducting device. While these stories are easy to find, I’m not going to link to them here, for the simple reason that none of them were supposed to exist yet.

As the world now knows, Google is indeed preparing a big announcement about quantum supremacy, to coincide with the publication of its research paper in a high-profile journal (which journal? you can probably narrow it down to two). This will hopefully happen within a month.

Meanwhile, though, NASA, which has some contributors to the work, inadvertently posted an outdated version of the Google paper on a public website. It was there only briefly, but long enough to make it to the Financial Times, my inbox, and millions of other places. Fact-free pontificating about what it means has predictably proliferated.

The world, it seems, is going to be denied its clean “moon landing” moment, wherein the Extended Church-Turing Thesis gets experimentally obliterated within the space of a press conference. This is going to be more like the Wright Brothers’ flight—about which rumors and half-truths leaked out in dribs and drabs between 1903 and 1908, the year Will and Orville finally agreed to do public demonstration flights. (This time around, though, it thankfully won’t take that long to clear everything up!)

I’ve known about what was in the works for a couple months now; it was excruciating not being able to blog about it. Though sworn to secrecy, I couldn’t resist dropping some hints here and there (did you catch any?)—for example, in my recent Bernays Lectures in Zürich, a lecture series whose entire structure built up to the brink of this moment.

This post is not an official announcement or confirmation of anything. Though the lightning may already be visible, the thunder belongs to the group at Google, at a time and place of its choosing.

Rather, because so much misinformation is swirling around, what I thought I’d do here, in my role as blogger and “public intellectual,” is offer Scott’s Supreme Quantum Supremacy FAQ. You know, just in case you were randomly curious about the topic of quantum supremacy, or wanted to know what the implications would be if some search engine company based in Mountain View or wherever were hypothetically to claim to have achieved quantum supremacy.

Without further ado, then:

Q1. What is quantum computational supremacy?

Often abbreviated to just “quantum supremacy,” the term refers to the use of a quantum computer to solve some well-defined set of problems that would take orders of magnitude longer to solve with any currently known algorithms running on existing classical computers—and not for incidental reasons, but for reasons of asymptotic quantum complexity. The emphasis here is on being as sure as possible that the problem really was solved quantumly and really is classically intractable, and ideally achieving the speedup soon (with the noisy, non-universal QCs of the present or very near future). If the problem is also useful for something, then so much the better, but that’s not at all necessary. The Wright Flyer and the Fermi pile weren’t useful in themselves.

Q2. If Google has indeed achieved quantum supremacy, does that mean that now “no code is uncrackable”, as Democratic presidential candidate Andrew Yang recently tweeted?

No, it doesn’t. (But I still like Yang’s candidacy.)

There are two issues here. First, the devices currently being built by Google, IBM, and others have 50-100 qubits and no error-correction. Running Shor’s algorithm to break the RSA cryptosystem would require several thousand logical qubits. With known error-correction methods, that could easily translate into millions of physical qubits, and those probably of a higher quality than any that exist today. I don’t think anyone is close to that, and we have no idea how long it will take.

But the second issue is that, even in a hypothetical future with scalable, error-corrected QCs, on our current understanding they’ll only be able to crack some codes, not all of them. By an unfortunate coincidence, the public-key codes that they can crack include most of what we currently use to secure the Internet: RSA, Diffie-Hellman, elliptic curve crypto, etc. But symmetric-key crypto should only be minimally affected. And there are even candidates for public-key cryptosystems (for example, based on lattices) that no one knows how to break quantumly after 20+ years of trying, and some efforts underway now to start migrating to those systems. For more, see for example my letter to Rebecca Goldstein.

Q3. What calculation is Google planning to do, or has it already done, that’s believed to be classically hard?

So, I can tell you, but I’ll feel slightly sheepish doing so. The calculation is: a “challenger” generates a random quantum circuit C (i.e., a random sequence of 1-qubit and nearest-neighbor 2-qubit gates, of depth perhaps 20, acting on a 2D grid of n = 50 to 60 qubits). The challenger then sends C to the quantum computer, and asks it apply C to the all-0 initial state, measure the result in the {0,1} basis, send back whatever n-bit string was observed, and repeat some thousands or millions of times. Finally, using its knowledge of C, the classical challenger applies a statistical test to check whether the outputs are consistent with the QC having done this.

So, this is not a problem like factoring with a single right answer. The circuit C gives rise to some probability distribution, call it DC, over n-bit strings, and the problem is to output samples from that distribution. In fact, there will typically be 2n strings in the support of DC—so many that, if the QC is working as expected, the same output will never be observed twice. A crucial point, though, is that the distribution DC is not uniform. Some strings enjoy constructive interference of amplitudes and therefore have larger probabilities, while others suffer destructive interference and have smaller probabilities. And even though we’ll only see a number of samples that’s tiny compared to 2n, we can check whether the samples preferentially cluster among the strings that are predicted to be likelier, and thereby build up our confidence that something classically intractable is being done.

So, tl;dr, the quantum computer is simply asked to apply a random (but known) sequence of quantum operations—not because we intrinsically care about the result, but because we’re trying to prove that it can beat a classical computer at some well-defined task.

Q4. But if the quantum computer is just executing some random garbage circuit, whose only purpose is to be hard to simulate classically, then who cares? Isn’t this a big overhyped nothingburger?

No. As I put it the other day, it’s not an everythingburger, but it’s certainly at least a somethingburger!

It’s like, have a little respect for the immensity of what we’re talking about here, and for the terrifying engineering that’s needed to make it reality. Before quantum supremacy, by definition, the QC skeptics can all laugh to each other that, for all the billions of dollars spent over 20+ years, still no quantum computer has even once been used to solve any problem faster than your laptop could solve it, or at least not in any way that depended on its being a quantum computer. In a post-quantum-supremacy world, that’s no longer the case. A superposition involving 250 or 260 complex numbers has been computationally harnessed, using time and space resources that are minuscule compared to 250 or 260.

I keep bringing up the Wright Flyer only because the chasm between what we’re talking about, and the dismissiveness I’m seeing in some corners of the Internet, is kind of breathtaking to me. It’s like, if you believed that useful air travel was fundamentally impossible, then seeing a dinky wooden propeller plane keep itself aloft wouldn’t refute your belief … but it sure as hell shouldn’t reassure you either.

Was I right to worry, years ago, that the constant drumbeat of hype about much less significant QC milestones would wear out people’s patience, so that they’d no longer care when something newsworthy finally did happen?

Q5. Years ago, you scolded the masses for being super-excited about D-Wave, and its claims to get huge quantum speedups for optimization problems via quantum annealing. Today you scold the masses for not being super-excited about quantum supremacy. Why can’t you stay consistent?

Because my goal is not to move the “excitement level” in some uniformly preferred direction, it’s to be right! With hindsight, would you say that I was mostly right about D-Wave, even when raining on that particular parade made me unpopular in some circles? Well, I’m trying to be right about quantum supremacy too.

Q6. If quantum supremacy calculations just involve sampling from probability distributions, how do you check that they were done correctly?

Glad you asked! This is the subject of a fair amount of theory that I and others developed over the last decade. I already gave you the short version in my answer to Q3: you check by doing statistics on the samples that the QC returned, to verify that they’re preferentially clustered in the “peaks” of the chaotic probability distribution DC. One convenient way of doing this, which Google calls the “linear cross-entropy test,” is simply to sum up Pr[C outputs si] over all the samples s1,…,sk that the QC returned, and then to declare the test a “success” if and only if the sum exceeds some threshold—say, bk/2n, for some constant b strictly between 1 and 2.

Admittedly, in order to apply this test, you need to calculate the probabilities Pr[C outputs si] on your classical computer—and the only known ways to calculate them require brute force and take ~2n time. Is that a showstopper? No, not if n is 50, and you’re Google and are able to handle numbers like 250 (although not 21000, which exceeds a googol, har har). By running a huge cluster of classical cores for (say) a month, you can eventually verify the outputs that your QC produced in a few seconds—while also seeing that the QC was many orders of magnitude faster. However, this does mean that sampling-based quantum supremacy experiments are almost specifically designed for ~50-qubit devices like the ones being built right now. Even with 100 qubits, we wouldn’t know how to verify the results using all the classical computing power available on earth.

(Let me stress that this issue is specific to sampling experiments like the ones that are currently being done. If Shor’s algorithm factored a 2000-digit number, it would be easy to check the result by simply multiplying the claimed factors and running a primality test on them. Likewise, if a QC were used to simulate some complicated biomolecule, you could check its results by comparing them to experiment.)

Q7. Wait. If classical computers can only check the results of a quantum supremacy experiment, in a regime where the classical computers can still simulate the experiment (albeit extremely slowly), then how do you get to claim “quantum supremacy”?

Come on. With a 53-qubit chip, it’s perfectly feasible to see a speedup by a factor of many millions, in a regime where you can still directly verify the outputs, and also to see that the speedup is growing exponentially with the number of qubits, exactly as asymptotic analysis would predict. This isn’t marginal.

Q8. Is there a mathematical proof that no fast classical algorithm could possibly spoof the results of a sampling-based quantum supremacy experiment?

Not at present. But that’s not quantum supremacy researchers’ fault! As long as theoretical computer scientists can’t even prove basic conjectures like P≠NP or P≠PSPACE, there’s no hope of ruling out a fast classical simulation unconditionally. The best we can hope for are conditional hardness results. And we have indeed managed to prove some such results—see for example the BosonSampling paper, or the Bouland et al. paper on average-case #P-hardness of calculating amplitudes in random circuits, or my paper with Lijie Chen (“Complexity-Theoretic Foundations of Quantum Supremacy Experiments”). The biggest theoretical open problem in this area, in my opinion, is to prove better conditional hardness results.

Q9. Does sampling-based quantum supremacy have any applications in itself?

When people were first thinking about this subject, it seemed pretty obvious that the answer was “no”! (I know because I was one of the people.) Recently, however, the situation has changed—for example, because of my certified randomness protocol, which shows how a sampling-based quantum supremacy experiment could almost immediately be repurposed to generate bits that can be proven to be random to a skeptical third party (under computational assumptions). This, in turn, has possible applications to proof-of-stake cryptocurrencies and other cryptographic protocols. I’m hopeful that more such applications will be discovered in the near future.

Q10. If the quantum supremacy experiments are just generating random bits, isn’t that uninteresting? Isn’t it trivial to convert qubits into random bits, just by measuring them?

The key is that a quantum supremacy experiment doesn’t generate uniform random bits. Instead, it samples from some complicated, correlated probability distribution over 50- or 60-bit strings. In my certified randomness protocol, the deviations from uniformity play a central role in how the QC convinces a classical skeptic that it really was sampling the bits randomly, rather than in some secretly deterministic way (e.g., using a pseudorandom generator).

Q11. Haven’t decades of quantum-mechanical experiments–for example, the ones that violated the Bell inequality–already demonstrated quantum supremacy?

This is purely a confusion over words. Those other experiments demonstrated other forms of “quantum supremacy”: for example, in the case of Bell inequality violations, what you could call “quantum correlational supremacy.” They did not demonstrate quantum computational supremacy, meaning doing something that’s infeasible to simulate using a classical computer (where the classical simulation has no restrictions of spatial locality or anything else of that kind). Today, when people use the phrase “quantum supremacy,” it’s generally short for quantum computational supremacy.

Q12. Even so, there are countless examples of materials and chemical reactions that are hard to classically simulate, as well as special-purpose quantum simulators (like those of Lukin’s group at Harvard). Why don’t these already count as quantum computational supremacy?

Under some people’s definitions of “quantum computational supremacy,” they do! The key difference with Google’s effort is that they have a fully programmable device—one that you can program with an arbitrary sequence of nearest-neighbor 2-qubit gates, just by sending the appropriate signals from your classical computer.

In other words, it’s no longer open to the QC skeptics to sneer that, sure, there are quantum systems that are hard to simulate classically, but that’s just because nature is hard to simulate, and you don’t get to arbitrarily redefine whatever random chemical you find in the wild to be a “computer for simulating itself.” Under any sane definition, the superconducting devices that Google, IBM, and others are now building are indeed “computers.”

Q13. Did you (Scott Aaronson) invent the concept of quantum supremacy?

No. I did play some role in developing it, which led to Sabine Hossenfelder among others generously overcrediting me for the whole idea. The term “quantum supremacy” was coined by John Preskill in 2012, though in some sense the core concept goes back to the beginnings of quantum computing itself in the early 1980s. In 1993, Bernstein and Vazirani explicitly pointed out the severe apparent tension between quantum mechanics and the Extended Church-Turing Thesis of classical computer science. Then, in 1994, the use of Shor’s algorithm to factor a huge number became the quantum supremacy experiment par excellence—albeit, one that’s still (in 2019) much too hard to perform.

The key idea of instead demonstrating quantum supremacy using a sampling problem was, as far as I know, first suggested by Barbara Terhal and David DiVincenzo, in a farsighted paper from 2002. The “modern” push for sampling-based supremacy experiments started around 2011, when Alex Arkhipov and I published our paper on BosonSampling, and (independently of us) Bremner, Jozsa, and Shepherd published their paper on the commuting Hamiltonians model. These papers showed, not only that “simple,” non-universal quantum systems can solve apparently-hard sampling problems, but also that an efficient classical algorithm for the same sampling problems would imply a collapse of the polynomial hierarchy. Arkhipov and I also made a start toward arguing that even the approximate versions of quantum sampling problems can be classically hard.

As far as I know, the idea of “Random Circuit Sampling”—that is, generating your hard sampling problem by just picking a random sequence of 2-qubit gates in (say) a superconducting architecture—originated in an email thread that I started in December 2015, which also included John Martinis, Hartmut Neven, Sergio Boixo, Ashley Montanaro, Michael Bremner, Richard Jozsa, Aram Harrow, Greg Kuperberg, and others. The thread was entitled “Hard sampling problems with 40 qubits,” and my email began “Sorry for the spam.” I then discussed some advantages and disadvantages of three options for demonstrating sampling-based quantum supremacy: (1) random circuits, (2) commuting Hamiltonians, and (3) BosonSampling. After Greg Kuperberg chimed in to support option (1), a consensus quickly formed among the participants that (1) was indeed the best option from an engineering standpoint—and that, if the theoretical analysis wasn’t yet satisfactory for (1), then that was something we could remedy.

[Update: Sergio Boixo tells me that, internally, the Google group had been considering the idea of random circuit sampling since February 2015, even before my email thread. This doesn’t surprise me: while there are lots of details that had to be worked out, the idea itself is an extremely natural one.]

After that, the Google group did a huge amount of analysis of random circuit sampling, both theoretical and numerical, while Lijie Chen and I and Bouland et al. supplied different forms of complexity-theoretic evidence for the problem’s classical hardness.

Q14. If quantum supremacy was achieved, what would it mean for the QC skeptics?

I wouldn’t want to be them right now! They could retreat to the position that of course quantum supremacy is possible (who ever claimed that it wasn’t? surely not them!), that the real issue has always been quantum error-correction. And indeed, some of them have consistently maintained that position all along. But others, including my good friend Gil Kalai, are on record, right here on this blog predicting that even quantum supremacy can never be achieved for fundamental reasons. I won’t let them wiggle out of it now.

[Update: As many of you will have seen, Gil Kalai has taken the position that the Google result won’t stand and will need to be retracted. He asked for more data: specifically, a complete histogram of the output probabilities for a smaller number of qubits. This turns out to be already available, in a Science paper from 2018.]

Q15. What’s next?

If it’s achieved quantum supremacy, then I think the Google group already has the requisite hardware to demonstrate my protocol for generating certified random bits. And that’s indeed one of the very next things they’re planning to do.

[Addendum: Also, of course, the evidence for quantum supremacy itself can be made stronger and various loopholes closed—for example, by improving the fidelity so that fewer samples need to be taken (something that Umesh Vazirani tells me he’d like to see), by having the circuit C be generated and the outputs verified by a skeptic external to Google. and simply by letting more time pass, so outsiders can have a crack at simulating the results classically. My personal guess is that the basic picture is going to stand, but just like with the first experiments that claimed to violate the Bell inequality, there’s still plenty of room to force the skeptics into a tinier corner.]

Beyond that, one obvious next milestone would be to use a programmable QC, with (say) 50-100 qubits, to do some useful quantum simulation (say, of a condensed-matter system) much faster than any known classical method could do it. A second obvious milestone would be to demonstrate the use of quantum error-correction, to keep an encoded qubit alive for longer than the underlying physical qubits remain alive. There’s no doubt that Google, IBM, and the other players will now be racing toward both of these milestones.

[Update: Steve Girvin reminds me that the Yale group has already achieved quantum error-correction “beyond the break-even point,” albeit in a bosonic system rather than superconducting qubits. So perhaps a better way to phrase the next milestone would be: achieve quantum computational supremacy and useful quantum error-correction in the same system.]

Another update: I thought this IEEE Spectrum piece gave a really nice overview of the issues.

Last update: John Preskill’s Quanta column about quantum supremacy is predictably excellent (and possibly a bit more accessible than this FAQ).

September 30, 2019

n-Category Café Applied Category Theory Meeting at UCR (Part 2)

Joe Moeller and I have finalized the schedule of our meeting on applied category theory:

Applied Category Theory, special session of the Fall Western Sectional Meeting of the AMS, U. C. Riverside, Riverside, California, 9–10 November 2019.

It’s going to be really cool, with talks on everything from brakes to bicategories, from quantum physics to social networks, and more — with the power of category theory as a unifying theme! Among other things, fellow n-Café host Mike Shulman is going to say how to get maps between symmetric monoidal bicategories from maps between symmetric monoidal double categories.

You can get information on registration, hotels and such here. If you’re coming, you might also want to attend Eugenia Cheng’s talk on the afternoon of Friday November 8th.   I’ll announce the precise title and time of her talk, and also the location of all the following talks, as soon as I know!

In what follows, the person actually giving the talk has an asterisk by their name. You can click on titles of talks to see the abstracts.

Saturday November 9, 2019, 8:00 a.m.-10:50 a.m.

Saturday November 9, 2019, 3:00 p.m.-5:50 p.m.

Sunday November 10, 2019, 8:00 a.m.-10:50 a.m.

Sunday November 10, 2019, 2:00 p.m.-4:50 p.m.

n-Category Café

Terence TaoAlmost all Collatz orbits attain almost bounded values

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the {3x+1} conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have

Conjecture 1 (Collatz conjecture) One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}.

Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result of Krasikov and Lagarias that

\displaystyle  \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}

for all sufficiently large {x}. In another direction, it was shown by Terras that for almost all {N} (in the sense of natural density), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved by Allouche to {\mathrm{Col}_{\min}(N) < N^\theta} for almost all {N} and any fixed {\theta > 0.869}, and extended later by Korec to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density).

Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this time. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)

When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a geometric distribution of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities

\displaystyle  0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a Syracuse random variable, and can be described explicitly as

\displaystyle  \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)

where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate

\displaystyle  \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)

\displaystyle  \ll_A m^{-A}

for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}

are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)

are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in (3) then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that

\displaystyle  \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)

for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming {n} is even for sake of discussion) as

\displaystyle  (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots

\displaystyle  + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n

where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

\displaystyle  \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}

is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which

\displaystyle  \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},

we have to show that (with overwhelming probability) the random walk

\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of {B}.

A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of (4), and thus Theorem 2.

September 29, 2019

Jordan EllenbergWatermelon, chevre, piment d’espelette

I spent a little time this summer visiting Institut Henri Poincare for their program on rational points, but this post is not about the math I did there, but about a salad I ate there. Not there at IHP, but at the terrific neighborhood bistro around the corner from where I was staying. I liked it so much I went there three times and I got this salad three times. I have been trying to recreate it at home. It’s good! Not Paris bistro good. But really good. Here is how I make it so I don’t forget.

  • Seedless watermelon cut in cubical or oblong chunks, as sweet as possible
  • Good chevre (not feta, chevre) ripped up into modest pieces
  • Some kind of not-too-bitter greens (I’ve been using arugula, they used some kind of micro watercressy kind of deal) Not a ton; this is a watermelon salad with some greens in it for color and accent, not a green salad.
  • Roasted pine nuts (I am thinking this could also be good with roasted pepitas but have not tried it)
  • Juice of a lime
  • Olive oil, the best you have
  • Piment d’espelette

I had never heard of piment d’espelette! It’s from the Basque part of France and is roughly in the paprika family but it’s different. I went to a spice store before I left Paris and bought a jar to bring home. So now I have something I thought my kitchen would never be able to boast: a spice Penzey’s doesn’t sell.

Anyway, the recipe is: put all that stuff in a bowl and mix it up. Or ideally put everything except the chevre in and mix it up and then strew the chevre on the top. Festive!

Of course the concept of watermelon and goat cheese as a summer salad is standard; but this is a lot better than any version of this I’ve had before.

September 28, 2019

Doug NatelsonItems of interest

As I struggle with being swamped this semester, some news items:
  • Scott Aaronson has a great summary/discussion about the forthcoming google/John Martinis result about quantum supremacy.  The super short version:  There is a problem called "random circuit sampling", where a sequence of quantum gate operations is applied to some number of quantum bits, and one would like to know the probability distribution of the outcomes.  Simulating this classically becomes very very hard as the number of qubits grows.  The google team apparently just implemented the actual problem directly using their 53-qubit machine, and could infer the probability distribution by directly sampling a large number of outcomes.   They could get the answer this way in 3 min 20 sec for a number of qubits where it would take the best classical supercomputer 10000 years to simulate.  Very impressive and certainly a milestone (though the paper is not yet published or officially released).  This has led to some fascinating semantic discussions with colleagues of mine about what we mean by computation.  For example, this particular situation feels a bit to me like comparing the numerical solution to a complicated differential equation (i.e. some Runge-Kutta method) on a classical computer with an analog computer using op-amps and R/L/C components.  Is the quantum computer here really solving a computational problem, or is it being used as an experimental platform to simulate a quantum system?  And what is the difference, and does it matter?  Either way, a remarkable achievement.  (I'm also a bit jealous that Scott routinely has 100+ comment conversations on his blog.)
  • Speaking of computational solutions to complex problems.... Many people have heard about chaotic systems and why numerical solutions to differential equations can be fraught with peril due to, e.g., rounding errors.  However, I've seen two papers this week that show just how bad this can be.  This very good news release pointed me to this paper, where it shows that even using 64 bit precision doesn't save you from issues in some systems.  Also this blog post points to this paper, which shows that n-body gravitational simulations have all sorts of problems along these lines.  Yeow.
  • SpaceX has assembled their mammoth sub-orbital prototype down in Boca Chica.  This is going to be used for test flights up to 22 km altitude, and landings.  I swear, it looks like something out of Tintin or The Conquest of Space.  Awesome.
  • Time to start thinking about Nobel speculation.  Anyone?

September 27, 2019

Matt von HippelFacts About Our Capabilities Are Facts About the World

A paper leaked from Google last week claimed that their researchers had achieved “quantum supremacy”, the milestone at which a quantum computer performs a calculation faster than any existing classical computer. Scott Aaronson has a great explainer about this. The upshot is that Google’s computer is much too small to crack all our encryptions (only 53 qubits, the equivalent of bits for quantum computers), but it still appears to be a genuine quantum computer doing a genuine quantum computation that is genuinely not feasible otherwise.

How impressed should we be about this?

On one hand, the practical benefits of a 53-qubit computer are pretty minimal. Scott discusses some applications: you can generate random numbers, distributed in a way that will let others verify that they are truly random, the kind of thing it’s occasionally handy to do in cryptography. Still, by itself this won’t change the world, and compared to the quantum computing hype I can understand if people find this underwhelming.

On the other hand, as Scott says, this falsifies the Extended Church-Turing Thesis! And that sounds pretty impressive, right?

Ok, I’m actually just re-phrasing what I said before. The Extended Church-Turing Thesis proposes that a classical computer (more specifically, a probabilistic Turing machine) can efficiently simulate any reasonable computation. Falsifying it means finding something that a classical computer cannot compute efficiently but another sort of computer (say, a quantum computer) can. If the calculation Google did truly can’t be done efficiently on a classical computer (this is not proven, though experts seem to expect it to be true) then yes, that’s what Google claims to have done.

So we get back to the real question: should we be impressed by quantum supremacy?

Well, should we have been impressed by the Higgs?

The detection of the Higgs boson in 2012 hasn’t led to any new Higgs-based technology. No-one expected it to. It did teach us something about the world: that the Higgs boson exists, and that it has a particular mass. I think most people accept that that’s important: that it’s worth knowing how the world works on a fundamental level.

Google may have detected the first-known violation of the Extended Church-Turing Thesis. This could eventually lead to some revolutionary technology. For now, though, it hasn’t. Instead, it teaches us something about the world.

It may not seem like it, at first. Unlike the Higgs boson, “Extended Church-Turing is false” isn’t a law of physics. Instead, it’s a fact about our capabilities. It’s a statement about the kinds of computers we can and cannot build, about the kinds of algorithms we can and cannot implement, the calculations we can and cannot do.

Facts about our capabilities are still facts about the world. They’re still worth knowing, for the same reasons that facts about the world are still worth knowing. They still give us a clearer picture of how the world works, which tells us in turn what we can and cannot do. According to the leaked paper, Google has taught us a new fact about the world, a deep fact about our capabilities. If that’s true we should be impressed, even without new technology.

September 26, 2019

Tommaso DorigoThe Plot Of The Week: New Limits On Higgs Decays To Electrons

This week's Plot relates to the search of rare decays of the Higgs boson, through the analysis of the large amounts of proton-proton collision data produced by the Large Hadron Collider (LHC), CERN's marvelous 27km particle accelerator. The ATLAS collaboration, which is one of the four main scientific equipes looking at LHC collisions, produced an improved bound on the rate at which Higgs bosons may decay to electron-positron pairs (which they are expected to do, although very rarely, in the Standard Model, SM) and to electron-muon pairs (which are forbidden in the SM).

read more

September 25, 2019

Scott AaronsonBlurry but clear enough

My vision is blurry right now, because yesterday I had a procedure called corneal cross-linking, intended to prevent further deterioration of my eyes as I get older. But I can see clearly enough to tap out a post with random thoughts about the world.

I’m happy that the Netanyahu era might finally be ending in Israel, after which Netanyahu will hopefully face some long-delayed justice for his eye-popping corruption. If only there were a realistic prospect of Trump facing similar justice. I wish Benny Gantz success in putting together a coalition.

I’m happy that my two least favorite candidates, Bill de Blasio and Kirsten Gillibrand, have now both dropped out of the Democratic primary. Biden, Booker, Warren, Yang—I could enthusiastically support pretty much any of them, if they looked like they had a good chance to defeat Twitler. Let’s hope.

Most importantly, I wish to register my full-throated support for the climate strikes taking place today all over the world, including here in Austin. My daughter Lily, age 6, is old enough to understand the basics of what’s happening and to worry about her future. I urge the climate strikers to keep their eyes on things that will actually make a difference (building new nuclear plants, carbon taxes, geoengineering) and ignore what won’t (banning plastic straws).

As for Greta Thunberg: she is, or is trying to be, the real-life version of the Comet King from Unsong. You can make fun of her, ask what standing or expertise she has as some random 16-year-old to lead a worldwide movement. But I suspect that this is always what it looks like when someone takes something that’s known to (almost) all, and then makes it common knowledge. If civilization makes it to the 22nd century at all, then in whatever form it still exists, I can easily imagine that it will have more statues of Greta than of MLK or Gandhi.

On a completely unrelated and much less important note, John Horgan has a post about “pluralism in math” that includes some comments by me.

Oh, and on the quantum supremacy front—I foresee some big news very soon. You know which blog to watch for more.

September 23, 2019

John PreskillYes, seasoned scientists do extraordinary science.

Imagine that you earned tenure and your field’s acclaim decades ago. Perhaps you received a Nobel Prize. Perhaps you’re directing an institute for science that you helped invent. Do you still do science? Does mentoring youngsters, advising the government, raising funds, disentangling logistics, presenting keynote addresses at conferences, chairing committees, and hosting visitors dominate the time you dedicate to science? Or do you dabble, attend seminars, and read, following progress without spearheading it?

People have asked whether my colleagues do science when weighed down with laurels. The end of August illustrates my answer.

At the end of August, I participated in the eighth Conference on Quantum Information and Quantum Control (CQIQC) at Toronto’s Fields Institute. CQIQC bestows laurels called “the John Stewart Bell Prize” on quantum-information scientists. John Stewart Bell revolutionized our understanding of entanglement, strong correlations that quantum particles can share and that power quantum computing. Aephraim Steinberg, vice-chair of the selection committee, bestowed this year’s award. The award, he emphasized, recognizes achievements accrued during the past six years. This year’s co-winners have been leading quantum information theory for decades. But the past six years earned the winners their prize.

Fields

Peter Zoller co-helms IQOQI in Innsbruck. (You can probably guess what the acronym stands for. Hint: The name contains “Quantum” and “Institute.”) Ignacio Cirac is a director of the Max Planck Institute of Quantum Optics near Munich. Both winners presented recent work about quantum many-body physics at the conference. You can watch videos of their talks here.

Peter discussed how a lab in Austria and a lab across the world can check whether they’ve prepared the same quantum state. One lab might have trapped ions, while the other has ultracold atoms. The experimentalists might not know which states they’ve prepared, and the experimentalists might have prepared the states at different times. Create multiple copies of the states, Peter recommended, measure the copies randomly, and play mathematical tricks to calculate correlations.

Ignacio expounded upon how to simulate particle physics on a quantum computer formed from ultracold atoms trapped by lasers. For expert readers: Simulate matter fields with fermionic atoms and gauge fields with bosonic atoms. Give the optical lattice the field theory’s symmetries. Translate the field theory’s Lagrangian into Hamiltonian language using Kogut and Susskind’s prescription. 

Laurels 1

Even before August, I’d collected an arsenal of seasoned scientists who continue to revolutionize their fields. Frank Wilczek shared a physics Nobel Prize for theory undertaken during the 1970s. He and colleagues helped explain matter’s stability: They clarified how close-together quarks (subatomic particles) fail to attract each other, though quarks draw together when far apart. Why stop after cofounding one subfield of physics? Frank spawned another in 2012. He proposed the concept of a time crystal, which is like table salt, except extended across time instead of across space. Experimentalists realized a variation on Frank’s prediction in 2018, and time crystals have exploded across the scientific literature.1

Rudy Marcus is 96 years old. He received a chemistry Nobel Prize, for elucidating how electrons hop between molecules during reactions, in 1992. I took a nonequilibrium-statistical-mechanics course from Rudy four years ago. Ever since, whenever I’ve seen him, he’s asked for the news in quantum information theory. Rudy’s research group operates at Caltech, and you won’t find “Emeritus” in the title on his webpage.

My PhD supervisor, John Preskill, received tenure at Caltech for particle-physics research performed before 1990. You might expect the rest of his career to form an afterthought. But he helped establish quantum computing, starting in the mid-1990s. During the past few years, he co-midwifed the subfield of holographic quantum information theory, which concerns black holes, chaos, and the unification of quantum theory with general relativity. Watching a subfield emerge during my PhD left a mark like a tree on a bicyclist (or would have, if such a mark could uplift instead of injure). John hasn’t helped create subfields only by garnering resources and encouraging youngsters. Several papers by John and collaborators—about topological quantum matter, black holes, quantum error correction, and more—have transformed swaths of physics during the past 15 years. Nor does John stamp his name on many papers: Most publications by members of his group don’t list him as a coauthor.

Laurels 2

Do my colleagues do science after laurels pile up on them? The answer sounds to me, in many cases, more like a roar than like a “yes.” Much science done by senior scientists inspires no less than the science that established them. Beyond their results, their enthusiasm inspires. Never mind receiving a Bell Prize. Here’s to working toward deserving a Bell Prize every six years.

 

With thanks to the Fields Institute, the University of Toronto, Daniel F. V. James, Aephraim Steinberg, and the rest of the conference committee for their invitation and hospitality.

You can find videos of all the conference’s talks here. My talk is shown here

1To scientists, I recommend this Physics Today perspective on time crystals. Few articles have awed and inspired me during the past year as much as this review did. 

September 21, 2019

Sean CarrollThe Notorious Delayed-Choice Quantum Eraser

Note: It is in the nature of book-writing that sometimes you write things that don’t end up appearing in the final book. I had a few such examples for Something Deeply Hidden, my book on quantum mechanics, Many-Worlds, and emergent spacetime. Most were small and nobody will really miss them, but I did feel bad about eliminating my discussion of the “delayed-choice quantum eraser,” an experiment that has caused no end of confusion. So here it is, presented in full. It’s a bit too technical for the book, I don’t know what I was thinking!

Let’s imagine you’re an undergraduate physics student, taking an experimental lab course, and your professor is in a particularly ornery mood. So she forces you to do a weird version of the double-slit experiment, explaining that this is something called the “delayed-choice quantum eraser.” You think you remember seeing a YouTube video about this once.

In the conventional double-slit, we send a beam of electrons through two slits and on toward a detecting screen. Each individual electron hits the screen and leaves a dot, but if we build up many such detections, we see an interference pattern of light and dark bands, because the wave function passing through the two slits interferes with itself. But if we also measure which slit each electron goes through, the interference pattern disappears, and we see a smoothed-out distribution at the screen. According to textbook quantum mechanics that’s because the wave function collapsed when we measured it at the slits; according to Many-Worlds it’s because the electron became entangled with the measurement apparatus, decoherence occurred as the apparatus became entangled with the environment, and the wave function branched into separate worlds, in each of which the electron only passes through one of the slits.

An interference pattern is seen when electrons travel through two slits (left),
unless a detector measures which slit each electron goes through (right).

The new wrinkle is that we are still going to “measure” which slit the electron goes through, but instead of reading it out on a big macroscopic dial, we simply store that information in a single qubit. Say that for every “traveling” electron passing through the slits, we have a separate “recording” electron. The pair becomes entangled in the following way: if the traveling electron goes through the left slit, the recording electron is in a spin-up state (with respect to the vertical axis), and if the traveling electron goes through the right, the recording electron is spin-down. We end up with:

Ψ = (L)[↑] + (R)[↓].

Our professor, who is clearly in a bad mood, insists that we don’t actually measure the spin of our recording electrons, and we don’t even let them wander off and bump into other things in the room. We carefully trap them and preserve them, perhaps in a magnetic field.

What do we see at the screen when we do this with many electrons? A smoothed-out distribution with no interference pattern, of course. Interference can only happen when two things contribute to exactly the same wave function, and since the two paths for the traveling electrons are now entangled with the recording electrons, the left and right paths are distinguishable, so we don’t see any interference pattern. In this case it doesn’t matter that we didn’t have honest decoherence; it just matters that the traveling electrons were entangled with the recording electrons. Entanglement of any sort kills interference.

Of course, we could measure the recording spin if we wanted to. If we measure it along the vertical axis, we will see either [↑] or [↓]. Referring back to the quantum state Ψ above, we see that this will put us in either a universe where the traveling electron went through the left slit, or one where it went through the right slit. At the end of the day, recording the positions of many such electrons when they hit the detection screen, we won’t see any interference.

Okay, says our somewhat sadistic professor, rubbing her hands together with villainous glee. Now let’s measure all of our recording spins, but this time measure them along the horizontal axis instead of the vertical one. As we saw in Chapter Four, there’s a relationship between the horizontal and vertical spin states; we can write

[↑] = [→] + [←] ,

[↓] = [→] – [←].

(To keep our notation simple we’re ignoring various factors of the square root of two.) So the state before we do such a measurement is

Ψ = (L)[→] + (L)[←] + (R)[→] – (R)[←]

= (L + R)[→] + (L – R)[←].

When we measured the recording spin in the vertical direction, the result we obtained was entangled with a definite path for the traveling electron: [↑] was entangled with (L), and [↓] was entangled with (R). So by performing that measurement, we knew that the electron had traveled through one slit or the other. But now when we measure the recording spin along the horizontal axis, that’s no longer true. After we do each measurement, we are again in a branch of the wave function where the traveling electron passes through both slits. If we measured spin-left, the traveling electron passing through the right slit picks up a minus sign in its contribution to the wave function, but that’s just math.

By choosing to do our measurement in this way, we have erased the information about which slit the electron went through. This is therefore known as a “quantum eraser experiment.” This erasure doesn’t affect the overall distribution of flashes on the detector screen. It remains smooth and interference-free.

But we not only have the overall distribution of electrons hitting the detector screen; for each impact we know whether the recording electron was measured as spin-left or spin- right. So, instructs our professor with a flourish, let’s go to our computers and separate the flashes on the detector screen into these two groups — those that are associated with spin- left recording electrons, and those that are associated with spin-right. What do we see now?

Interestingly, the interference pattern reappears. The traveling electrons associated with spin-left recording electrons form an interference pattern, as do the ones associated with spin-right. (Remember that we don’t see the pattern all at once, it appears gradually as we detect many individual flashes.) But the two interference patterns are slightly shifted from each other, so that the peaks in one match up with the valleys in the other. There was secretly interference hidden in what initially looked like a featureless smudge.

Adapted from Wikipedia

In retrospect this isn’t that surprising. From looking at how our quantum state Ψ was written with respect to the spin-left and -right recording electrons, each measurement was entangled with a traveling electron going through both slits, so of course it could interfere. And that innocent-seeming minus sign shifted one of the patterns just a bit, so that when combined together the two patterns could add up to a smooth distribution.

You professor seems more amazed by this than you are. “Don’t you see,” she exclaims excitedly. “If we didn’t measure the recording photons at all, or if we measured them along the vertical axis, there was no interference anywhere. But if we measured them along the horizontal axis, there secretly was interference, which we could discover by separating out what happens at the screen when the recording spin was left or right.”

You and your classmates nod their heads, cautiously but with some degree of confusion.

“Think about what that means! The choice about whether to measure our recording spins vertically or horizontally could have been made long after the traveling photons splashed on the recording screen. As long as we stored our recording spins carefully and protected them from becoming entangled with the environment, we could have delayed that choice until years later.”

Sure, the class mumbles to themselves. That sounds right.

“But interference only happens when the traveling electron goes through both slits, and the smooth distribution happens when it goes through only one slit. That decision — go through both slits, or just through one — happens long before we measure the recording electrons! So obviously, our choice to measure them horizontally rather than vertically had to send a signal backward in time to tell the traveling electrons to go through both slits rather than just one!”

After a short, befuddled pause, the class erupts with objections. Decisions? Backwards in time? What are we talking about? The electron doesn’t make a choice to travel through one slit or the other. Its wave function (and that of whatever it’s entangled with) evolves according to the Schrödinger equation, just like always. The electron doesn’t make choices, it unambiguously goes through both slits, but it becomes entangled along the way. By measuring the recording photons along different directions, we can pick out different parts of that entangled wave function, some of which exhibit interference and others do not. Nothing really went backwards in time. It’s kind of a cool result, but it’s not like we’re building a frickin’ time machine here.

You and your classmates are right. Your instructor has gotten a little carried away. There’s a temptation, reinforced by the Copenhagen interpretation, to think of an electron as something “with both wave-like and particle-like properties.” If we give into that temptation, it’s a short journey to thinking that the electron must behave in either a wave- like way or a particle-like way when it passes through the slits, and in any given experiment it will be one or the other. And from there, the delayed-choice experiment does indeed tend to suggest that information had to go backwards in time to help the electron make its decision. And, to be honest, there is a tradition in popular treatments of quantum mechanics to make things seem as mysterious as possible. Suggesting that time travel might be involved somehow is just throwing gasoline on the fire.

All of these temptations should be resisted. The electron is simply part of the wave function of the universe. It doesn’t make choices about whether to be wave-like or particle-like. But a number of serious researchers in quantum foundations really do take the delayed-choice quantum eraser and analogous experiments (which have been successfully performed, by the way) as evidence of retrocausality in nature — signals traveling backwards in time to influence the past. A form of this experiment was originally proposed by none other than John Wheeler, who envisioned a set of telescopes placed on the opposite side of the screen from the slits, which could detect which slit the electrons went through long after they had passed through. Unlike some later commentators, Wheeler didn’t go so far as to suggest retrocausality, and knew better than to insist that an electron is either a particle or a wave at all times.

There’s no need to invoke retrocausality to explain the delayed-choice experiment. To an Everettian, the result makes perfect sense without anything traveling backwards in time. The trickiness relies on the fact that by becoming entangled with a single recording spin rather than with the environment and its zillions of particles, the traveling electrons only became kind-of decohered. With just a single particle to worry about observing, we are allowed to contemplate measuring it in different ways. If, as in the conventional double- slit setup, we measured the slit through which the traveling electron went via a macroscopic pointing device, we would have had no choice about what was being observed. True decoherence takes a tiny quantum entanglement and amplifies it, effectively irreversibly, into the environment. In that sense the delayed-choice quantum eraser is a useful thought experiment to contemplate the role of decoherence and the environment in measurement.

But alas, not everyone is an Everettian. In some other versions of quantum mechanics, wave functions really do collapse, not just the apparent collapse that decoherence provides us with in Many-Worlds. In a true collapse theory like GRW, the process of wave- function collapse is asymmetric in time; wave functions collapse, but they don’t un- collapse. If you have collapsing wave functions, but for some reason also want to maintain an overall time-symmetry to the fundamental laws of physics, you can convince yourself that retrocausality needs to be part of the story.

Or you can accept the smooth evolution of the wave function, with branching rather than collapses, and maintain time-symmetry of the underlying equations without requiring backwards-propagating signals or electrons that can’t make up their mind.

September 20, 2019

Matt von HippelThe Changing Meaning of “Explain”

This is another “explanations are weird” post.

I’ve been reading a biography of James Clerk Maxwell, who formulated the theory of electromagnetism. Nowadays, we think about the theory in terms of fields: we think there is an “electromagnetic field”, filling space and time. At the time, though, this was a very unusual way to think, and not even Maxwell was comfortable with it. He felt that he had to present a “physical model” to justify the theory: a picture of tiny gears and ball bearings, somehow occupying the same space as ordinary matter.

Bang! Bang! Maxwell’s silver bearings…

Maxwell didn’t think space was literally filled with ball bearings. He did, however, believe he needed a picture that was sufficiently “physical”, that wasn’t just “mathematics”. Later, when he wrote down a theory that looked more like modern field theory, he still thought of it as provisional: a way to use Lagrange’s mathematics to ignore the unknown “real physical mechanism” and just describe what was observed. To Maxwell, field theory was a description, but not an explanation.

This attitude surprised me. I would have thought physicists in Maxwell’s day could have accepted fields. After all, they had accepted Newton.

In his time, there was quite a bit of controversy about whether Newton’s theory of gravity was “physical”. When rival models described planets driven around by whirlpools, Newton simply described the mathematics of the force, an “action at a distance”. Newton famously insisted hypotheses non fingo, “I feign no hypotheses”, and insisted that he wasn’t saying anything about why gravity worked, merely how it worked. Over time, as the whirlpool models continued to fail, people gradually accepted that gravity could be explained as action at a distance.

You’d think that this would make them able to accept fields as well. Instead, by Maxwell’s day the options for a “physical explanation” had simply been enlarged by one. Now instead of just explaining something with mechanical parts, you could explain it with action at a distance as well. Indeed, many physicists tried to explain electricity and magnetism with some sort of gravity-like action at a distance. They failed, though. You really do need fields.

The author of the biography is an engineer, not a physicist, so I find his perspective unusual at times. After discussing Maxwell’s discomfort with fields, the author says that today physicists are different: instead of insisting on a physical explanation, they accept that there are some things they just cannot know.

At first, I wanted to object: we do have physical explanations, we explain things with fields! We have electromagnetic fields and electron fields, gluon fields and Higgs fields, even a gravitational field for the shape of space-time. These fields aren’t papering over some hidden mechanism, they are the mechanism!

Are they, though?

Fields aren’t quite like the whirlpools and ball bearings of historical physicists. Sometimes fields that look different are secretly the same: the two “different explanations” will give the same result for any measurement you could ever perform. In my area of physics, we try to avoid this by focusing on the measurements instead, building as much as we can out of observable quantities instead of fields. In effect we’re going back yet another layer, another dose of hypotheses non fingo.

Physicists still ask for “physical explanations”, and still worry that some picture might be “just mathematics”. But what that means has changed, and continues to change. I don’t think we have a common standard right now, at least nothing as specific as “mechanical parts or action at a distance, and nothing else”. Somehow, we still care about whether we’ve given an explanation, or just a description, even though we can’t define what an explanation is.

Tommaso DorigoForensic Evidence In Paul Frampton's Drug Smuggling Case

A few weeks ago, in an article where I discussed some new ideas for fundamental physics research, I briefly touched on an incident in which Paul Frampton, a well-known theoretical physicist, got involved in 2011. The paragraph in question read:

read more

September 19, 2019

Doug NatelsonDOE Experimental Condensed Matter PI meeting, 2019

The US Department of Energy's Basic Energy Sciences component of the Office of Science funds a lot of basic scientific research, and for the last decade or so had a tradition of regular gatherings of their funded principal investigators for a number of programs.  Every two years there has been a PI meeting for the Experimental Condensed Matter Physics program, and this year's meeting starts tomorrow.

These meetings are very educational (at least for me) and, because of their modest size, a much better networking setting than large national conferences.  In past years I've tried to write up brief highlights of the meetings (for 2017, see a, b, c; for 2015 see a, b, c; for 2013 see a, b).   I will try to do this again; the format of the meeting has changed to include more poster sessions, which makes summarizing trickier, but we'll see.

update:  Here are my write-ups for day 1, day 2, and day 3.

Doug NatelsonDOE Experimental Condensed Matter PI Meeting, day 3 and wrapup

On the closing day of the PI meeting, some further points and wrap-up:

  • I had previously missed work that shows that electric field can modulate magnetic exchange in ultrathin iron (overview).
  • Ferroelectric layers can modulate transport in spin valves by altering the electronic energetic alignment at interfaces.  This can result in some unusual response (e.g., the sign of the magnetoresistance can flip with the sign of the current, implying spin-diode-like properties).
  • Artificial spin ices are still cool model systems.  With photoelectron emission microscopy (PEEM), it's possible to image ultrathin, single-domain structures to reveal their mangetization noninvasively.  This means movies can be made showing thermal fluctuations of the spin ice constituents, revealing the topological character of the magnetic excitations in these systems.  
  • Ultrathin oxide membranes mm in extent can be grown, detached from their growth substrates, and transferred or stacked.  When these membranes are really thin, it becomes difficult to nucleate cracks, allowing the membranes to withstand large strains (several percent!), opening up the study of strain effects on a variety of oxide systems.
  • Controlled growth of stacked phthalocyanines containing transition metals can generate nice model systems for studying 1d magnetism, even using conventional (large-area) methods like vibrating sample magnetometry.
  • In situ oxide MBE and ARPES, plus either vacuum annealing or ozone annealing, has allowed the investigation of the BSCCO superconducting phase diagram over the whole range of dopings, from severely underdoped to so overdoped that superconductivity is completely suppressed.  In the overdoped limit, analyzing the kink found in the band dispersion near the antinode, it seems superconductivity is suppressed at high doping because the coupling (to the mode that causes the kink) goes to zero at large doping.  
  • It's possible to grow nice films of C60 molecules on Bi2Se3 substrates, and use ARPES to see the complicated multiple valence bands at work in this system.  Moreover, by doing measurements as a function of the polarization of the incoming light, the particular molecular orbitals contributing to those bands can be identified.
  • Through careful control of conditions during vacuum filtration, it's possible to produce dense, locally crystalline films of aligned carbon nanotubes.  These have remarkable optical properties, and with the anisotropy of their electronic structure plus ultraconfined character, it's possible to get exciton polaritons in these into the ultrastrong coupling regime.
Overall this was a very strong meeting - the variety of topics in the program is impressive, and the work shown in the talks and posters was uniformly interesting and of high quality. 

September 17, 2019

Peter Rohde TEDxNewtown

Meet Peter Rohde, an Australian Research Council Future Fellow in the Centre for Quantum Software & Information at the University of Technology, Sydney. His theoretical proposals have inspired several world-leading experimental efforts in optical quantum information processing

As a collaborator in China’s world-first quantum satellite program, he aided the design of quantum protocols for space-based demonstration. Rohde has worked at highly acclaimed institutes such as the University of Oxford and Institute for Molecular Biosciences, with over 60 publications and 1,500+ citations in quantum optics, quantum information theory, ecology, and politics.

Learn more about the world of quantum computing through the eyes of Peter Rohde. Grab your tiks.

September 16, 2019

Noncommutative GeometryJami Workshop: Riemann-Roch in characteristic one and related topics

The Johns Hopkins Mathematics Department jointly with the Japan-U.S. Mathematics Institute (JAMI) plan to organize a workshop on the weekend of October 18-20 2019. The interactions between the fields of noncommutative arithmetic geometry, tropical geometry, the theory of toposes and mathematical logic, optimization and game theory have matured quite rapidly in the last few years and have

Peter Rohde Don’t stop Fake News

Given the rate of information flow in the social media generation, and the ability for information to go internationally viral in a matter of minutes — which only requires thoughtless button-clicking within a few degrees of separation — it’s undeniable that the propagation of Fake News poses a major threat. Whether it be malicious electoral interference, or the perpetration of nonsensical views on medicine, leading to the reemergence of deadly, but entirely preventable diseases, the implications are undeniably catastrophic, already have been, and pose a major threat to humanity.

For this reason it’s understandable that people want to put an end to it. Of course we don’t want measles (or Trump). But how do we achieve this? Many politicians around the world are pressuring social media giants to filter content to eliminate fake news, while others are advocating legislation to force them to.

I oppose such approaches outright, and believe they pave the way for even greater thought manipulation. (Interpret the terminology fake news prevention, as being synonymous with terrorists, drugs and pedophiles, as per my last article).

Most news is fake (or misrepresented)

What constitutes fake news, anyway? Given that even upon reading articles about the same event, as portrayed by two ideologically distinct, yet well-respected mainstream newspapers, the tilt can be so astronomical, with both sides criticising the other for bias and corruption, the notion of fakeness is hardly an objective one. When it comes to statements made by politicians it’s even more perverse.

There is no such thing as an unbiased media source, nor will any story we read have full access to all information, or the full background context, or be 100% verifiably correct. Essentially what propagates over the internet is a close approximation to white-noise. Applying the appropriate filter, you can extract any signal you want.

Any kind of enforcement of filtering or information suppression implies certain types of information being removed at the behest of those with the ability to do so. Those people are necessarily in positions of power and influence, and will pursue their own interests over the collective one. The ability to impose filtering, enables post-selection bias by mandate. In conjunction with the false sense of security that a filtering system creates, the outcome is even greater vulnerability to the self-reinforcement and confirmation biases we seek to avoid.

The pretext for power

The implications of the ability for those in power to manipulate this to their advantage is obvious, and the basis upon which totalitarian societies are built. Already in Singapore there have been deep concerns surrounding this, where anti-fake news legislation requires organisations to,

“carry corrections or remove content the government considers to be false, with penalties for perpetrators including prison terms of up to 10 years or fines up to S$1m.”

The term “the government considers to be false” is an illuminating one.

Once a mandate for filtering is established, its application cannot be confined to what is ‘fake’, nor can we trust those making that determination to wield this extraordinary power. With such a mandate in place, the parameters defining its implementation will evolve with the political agenda, likely via regulation than via legislation — isolating it entirely from any democratic oversight or debate. Regardless who is at the helm, be sure that it will be used to undermine those who are not. History substantiates this — it is why we hold them to account, rather than blindly trust them to do what is right.

How to fight fake news

Instead of relying on those with vested interests to take on fake news, we must arm ourselves to do it in their absence. We must do so in a manner that is collective, transparent, decentralised, and robust against malign political interference (i.e all political interference).

Education

By far the most powerful avenue towards combating fake news is for people being equipped with the skills to do so themselves. For this reason, the following should be taught to all, from the earliest possible age, including making them essential components of our education system:

  • Critical thinking and rationalism.
  • Recognising logical fallacies.
  • Elementary statistics and probability theory (even if only qualitatively at an early level).
  • Online research skills, and the difference between what constitutes research versus Googling to find the answer you want to believe (i.e confirmation bias — “I was trying to find out whether the Moon landing was a conspiracy, and came across this amazing post on 8chan by this guy who runs an anti-vax blog (he’s pretty high up) that provided a really comprehensive and thoughtful analysis of this! BTW, did you know that the CIA invented Hitler in the 60’s as a distraction from the Vietnam war? I fact-checked it with more Googling, and it works out.”).
  • Encouraging kids to take up debating in school, where these become essential skills.

Already Finland has reportedly had great success in pursuing precisely this approach at the school level, with similar discussions emerging in the UK and within the OECD. Finland’s approach (Nb: I don’t know the details of the curriculum), is foresighted and correct.

Algorithmically

Sometimes our ability to spot fakeness at a glance is challenging, and even the most mindful social media users will routinely fall for things, making software tools indispensable to a robust process. Certainly, modern analytical techniques could be employed for this purpose to reveal the reliability of information sources, usually with a high degree of accuracy. When it comes to social media giants applying fake news filters, this is inevitably the route that will be taken. It can’t possibly be done by hand.

If the purpose of such software tools is to make us aware of misleading information, then its manipulation provides an even more powerful avenue for misleading us than the underlying information itself, based on the false sense of security, and our own subsequent subconscious loosening of internal filtering standards.

To illustrate this, the exisiting social media giants, Facebook and Twitter, are already routinely accused of implementing their anti-hate-speech policies in a highly inconsistent and asymmetric manner. Everyone will have their own views on this, but from my own observations I agree with this assessment. Note that selectively preventing hate speech from one side, whilst not doing so for the other, is an implicit endorsement of the latter, tantamount to direct political support. This type of political support — the ability to freely communicate, and simultaneous denial of one’s opponents to do so — is the single greatest political asset one can have. The ability to platform and de-platform entire organisations or ideologies is the single most politically powerful position one can hold — it’s no coincidence that the first step taken under the formation of any totalitarian state, is centralised control of the media.

This implies that any tools we rely on for this purpose must be extremely open, transparent, understandable, and robust against intentional manipulation. In the same way that you would not employ a proprietary cryptographic algorithm for encrypting sensitive data, with no knowledge of its internal functioning, the same standard of trust must be applied when interpreting the reliability of information, yet alone outright filtering.

Simultaneously, these tools must be allowed to evolve and compete. If they are written behind closed doors by governments or by corporations, none of these criteria will be met. The tools cannot fall under any kind of political control, and must be decentralised and independent.

Tools based on community evaluation and consensus should be treated with caution, given their vulnerability to self-reinforcement via positive feedback loops of their own — a new echo-chamber. Indeed, this vulnerability is precisely the one that fake news exploits to go viral in the first place.

Will machine learning save us?

Identifying unreliable information sources is something that modern machine learning techniques are extremely well-suited to, and if implemented properly, would likely be our most useful tool in fact-checking and fake news identification. However, these techniques are inherently at odds with my advocacy for algorithmic transparency.

In machine learning, by definition, we don’t hard-code software to spot certain features. Rather we train it using sample data, allowing it to uncover underlying relationships and correlations for itself. A well-trained system is then in principle able to operate upon new data it hadn’t previously been exposed to, and identify similar features and patterns. The problem is that what the system has learned to see is not represented in human-readable form, nor even comprehensible to us, given its mathematical sophistication. If the original training data were to be manipulated, the system could easily be coaxed into intentionally exhibiting the biases of its trainers, which would be extremely difficult to identify by outsiders.

I don’t advocate against the use of machine learning techniques at all. However I very much advocate for recognising their incompatibility with the desire for complete transparency and openness, and the recognition that this establishes a direct avenue for manipulation.

Design for complacency

The biggest obstacle of all to seeing through fact from fiction, is our own complacency, and desire to even do so. Given that in just minutes a Facebook or Twitter user can scroll through hundreds of posts, if establishing the reliability of a source requires opening multiple new browser windows to cross-check and research each one individually, it will undermine the user experience — the average user (especially those most vulnerable to influence by fake news) will not be bothered to.

The tools we develop for verifying reliability must accommodate for this as the most important design consideration, providing a fully integrated and user-friendly mechanism, which does not detract from the inherently addictive, slot-machine-like appeal of the social media experience. If the tools detract from the user experience, they will be rejected and become ineffective at a mass scale.

Modern-day book burning

What interest does the State have in preventing Fake News? None, this is how they subsist. What they actually have a desire for is to selectively eliminate information which works against their interests.

In the presence of overwhelming white-noise, selective elimination is just as powerful as the creation of new misinformation.

Providing them with a mandate to restrict the information we are able to see (in the ‘public interest’ no less) is to grant them the right to conduct the 21st century equivalent of 1940’s book burning ceremonies. Needless to say, having established a mandate to hold the ceremonies, they will decide for themselves which books get burnt.

Rather than burn our books on our behalf, let us decide which ones we would like to read, but let us also develop trustworthy, reliable, and accessible tools for making that determination for ourselves. Admittedly, much of society is highly fallible and unreliable when it comes to making such self-determination. To those in positions of power this applies even more so, given that they necessarily have interests to pursue, and seek a centralised approach for that reason.

There is an important relationship between free people and those in power that must be maintained, whereby our freedoms will only be upheld if accountability is enforced. The latter is our responsibility, not theirs. To delegate the accountability process — of which the free-flow of information is the single most pivotal — to those being held to account, is to capitulate entirely, and voluntarily acquiesce to subservience via population control.

September 13, 2019

Matt von HippelCongratulations to Simon Caron-Huot and Pedro Vieira for the New Horizons Prize!

The 2020 Breakthrough Prizes were announced last week, awards in physics, mathematics, and life sciences. The physics prize was awarded to the Event Horizon Telescope, with the $3 million award to be split among the 347 members of the collaboration. The Breakthrough Prize Foundation also announced this year’s New Horizons prizes, six smaller awards of $100,000 each to younger researchers in physics and math. One of those awards went to two people I know, Simon Caron-Huot and Pedro Vieira. Extremely specialized as I am, I hope no-one minds if I ignore all the other awards and talk about them.

The award for Caron-Huot and Vieira is “For profound contributions to the understanding of quantum field theory.” Indeed, both Simon and Pedro have built their reputations as explorers of quantum field theories, the kind of theories we use in particle physics. Both have found surprising behavior in these theories, where a theory people thought they understood did something quite unexpected. Both also developed new calculation methods, using these theories to compute things that were thought to be out of reach. But this is all rather vague, so let me be a bit more specific about each of them:

Simon Caron-Huot is known for his penetrating and mysterious insight. He has the ability to take a problem and think about it in a totally original way, coming up with a solution that no-one else could have thought of. When I first worked with him, he took a calculation that the rest of us would have taken a month to do and did it by himself in a week. His insight seems to come in part from familiarity with the physics literature, forgotten papers from the 60’s and 70’s that turn out surprisingly useful today. Largely, though, his insight is his own, an inimitable style that few can anticipate. His interests are broad, from exotic toy models to well-tested theories that describe the real world, covering a wide range of methods and approaches. Physicists tend to describe each other in terms of standard “virtues”: depth and breadth, knowledge and originality. Simon somehow seems to embody all of them.

Pedro Vieira is mostly known for his work with integrable theories. These are theories where if one knows the right trick one can “solve” the theory exactly, rather than using the approximations that physicists often rely on. Pedro was a mentor to me when I was a postdoc at the Perimeter Institute, and one thing he taught me was to always expect more. When calculating with computer code I would wait hours for a result, while Pedro would ask “why should it take hours?”, and if we couldn’t propose a reason would insist we find a quicker way. This attitude paid off in his research, where he has used integrable theories to calculate things others would have thought out of reach. His Pentagon Operator Product Expansion, or “POPE”, uses these tricks to calculate probabilities that particles collide, and more recently he pushed further to other calculations with a hexagon-based approach (which one might call the “HOPE”). Now he’s working on “bootstrapping” up complicated theories from simple physical principles, once again asking “why should this be hard?”

Tommaso DorigoQuark Nuggets Of Dark Matter As The Origin Of Dama-Libra Signal ?

Sometimes browsing the Cornell ArXiv results in very interesting reading. It is the case with the preprint I got to read today, "DAMA/LIBRA annual modulation and Axion Quark Nugget Dark Matter Model", by Ariel Zhitnitsky. This article puts forth a bold speculative claim, which I found exciting for a variety of reasons. As is the case with bold speculative claims, the odds that they turn out to describe reality is maybe small, but their entertainment value is large. So what is this about?

read more

September 11, 2019

September 09, 2019

Tommaso DorigoA Concert In Crete

On August 20, in occasion of the "5th International Workshop on Nucleon Structure at Large Bjorken x", organized at the Orthodox Academy of Crete, I had the pleasure to accompany at the piano my wife, the soprano Kalliopi Petrou, for a concert offered to the participants to the workshop by the organizers.

read more

September 04, 2019

Terence Tao254A announcement: Analytic prime number theory

In the fall quarter (starting Sep 27) I will be teaching a graduate course on analytic prime number theory.  This will be similar to a graduate course I taught in 2015, and in particular will reuse several of the lecture notes from that course, though it will also incorporate some new material (and omit some material covered in the previous course, to compensate).  I anticipate covering the following topics:

  1. Elementary multiplicative number theory
  2. Complex-analytic multiplicative number theory
  3. The entropy decrement argument
  4. Bounds for exponential sums
  5. Zero density theorems
  6. Halasz’s theorem and the Matomaki-Radziwill theorem
  7. The circle method
  8. (If time permits) Chowla’s conjecture and the Erdos discrepancy problem

Lecture notes for topics 3, 6, and 8 will be forthcoming.

 

Clifford JohnsonTwo Days at San Diego Comic-Con 2019

[caption id="attachment_19354" align="aligncenter" width="499"] Avengers cosplayers in the audience of my Friday panel.[/caption]It might surprise you to know just how much science gets into the mix at Comic-Con. This never makes it to the news of course - instead its all stories about people dressing up in costumes, and of course features about big movie and TV announcements. Somewhere inside this legendary pop culture maelstrom there’s something for nearly everyone, and that includes science. Which is as it should be. Here’s a look at two days I spent there. [I took some photos! (All except two here - You can click on any photo to enlarge it.]

Day 1 – Friday

I finalized my schedule rather late, and so wasn’t sure of my hotel needs until it was far too late to find two nights in a decent hotel within walking distance of the San Diego Convention Center — well, not for prices that would fit with a typical scientist’s budget. So, I’m staying in a motel that’s about 20 minutes away from the venue if I jump into a Lyft.

My first meeting is over brunch at the Broken Yolk at 10:30am, with my fellow panellists for the panel at noon, “Entertaining Science: The Real, Fake, and Sometimes Ridiculous Ways Science Is Used in Film and TV”. They are Donna J. Nelson, chemist and science advisor for the TV show Breaking Bad (she has a book about it), Rebecca Thompson, Physicist and author of a new book about the science of Game of Thrones, and our moderator Rick Loverd, the director of the Science and Entertainment Exchange, an organization set up by the National Academy of Sciences. I’m on the panel also as an author (I wrote and drew a non-fiction graphic novel about science called The Dialogues). My book isn’t connected to a TV show, but I’ve worked on many TV shows and movies as a science advisor, and so this rounds out the panel. All our books are from [...] Click to continue reading this post

The post Two Days at San Diego Comic-Con 2019 appeared first on Asymptotia.

September 01, 2019

Peter Rohde Encryption & anonymity is a responsibility not a right – In defence of cryptoanarchy

Most of the world’s internet users feel little need to rely on encryption, beyond when it is completely obvious and implemented by default (e.g when performing online banking etc.). But when it comes to personal communications, where traffic interception by the State is highly likely in some jurisdictions, an outright certainty in others, the average user takes the “I have nothing to hide” attitude.

Assume by default, whether you live in a ‘democracy’ or under overt fascism, that the State intercepts everything. Constitutional rights are a facade. They are not enforced by the State, but to hold them to account. The onus for enforcing them lies upon us.

In today’s world, where advanced machine learning technology is freely available to all, and implementable by those with even the most elementary technical expertise, this attitude is naive at best, and wilfully negligent at worst, based on the outdated notion that all the State has to gain from unencrypted communications, and the identities of those involved, is some obscure and seemingly unimportant piece of information (who cares what I had for breakfast, you say?). This is based on a completely forgivable, but fundamental misunderstanding of not just what information can be directly extracted, but which can be inferred from it.

What can be inferred about you, indirectly allows things to be inferred about others, and others...

We live in an era of incredibly advanced analytical techniques, backed by astronomical computing resources (in which the State has the monopoly), where based on nth degrees of separation, and the extraction of cross-correlations between who they are and what they say, anything and everything we say directly compromises the integrity of the rest of society.

The information that can be exploited includes, but not in the slightest limited to:

  • Message content itself.
  • When it was sent and received.
  • Who the sender and receiver were.
  • Their respective geographic locations, including realtime GPS coordinates.
  • More generally, everything included under the generic term metadata.
  • And far more things than I dare imagine...

All of these things can subsequently be correlated with all other information held about you, and those you engage with, and similarly for them, extended onwards to arbitrary degree.

The simplest example of how this type of technology works is one we are all familiar with — online recommendation systems (including purchasing suggestions, and social media friend recommendations). By correlating the things you have previously expressed interest in (e.g via online purchases, or existing friendship connections, as well as theirs), advertisers can, with sometimes astonishing degrees of accuracy, anticipate seemingly unrelated products to put forth as suggestions of potential interest.

Alternately, I’m sure I’m not the only one who can attest to having received Facebook friend recommendations based on someone I had a conversation with at a bar, but with whom no digital information of any form was exchanged. But in fact we did — via the realtime correlation of our GPS coordinates, it can be inferred that we spent some time engaging with one another that couldn’t have been by coincidence.

But the depth of analysis that can be performed, can extend far beyond this, to infer almost unimaginably specific information about us.

Behind the scenes, the analysis behind this is incredibly sophisticated, and invisible to us, performing all manner of cross-correlations across multiple degrees of separation, or indeed across society as a whole, based on seemingly obscure information we’d never have given much thought to. This includes not just correlations with other things you have taken interest in (people you have communicated with, or products you have purchased), but also knowledge about which groups (defined arbitrarily — social, demographic, ideological, interest-based, whatever) you belong to, whether overtly specified or not, and the collective behaviour of the respective group. Mathematically, this can be revealed via membership to connected sub-graphs (graph cliques) within a social network graph.

Entire fields of mathematics and computer science, notably graph theory and topological data analysis, dedicate themselves to this pursuit. Machine learning techniques are perhaps the most versatile and useful of them all.

These identified sub-graphs can be associated as ‘groups’ (in any real-world or abstract sense), identified in some manner as having something in common, from which potentially entirely distinct characteristics of their constituents can be inferred via correlations across groups. Membership to multiple groups can quickly narrow down the specifics of individuals sitting at the intersection of groups.

This type of analysis is very much the crux of what modern machine learning does by design — advanced higher-order analysis of multivariate correlations, particularly well suited to social network graph analysis, where nodes represent individuals, and edges are weighted by advanced data-structures characterising all aspects of their relationships, far beyond just ‘who knows who’.

For clarity, many familiar with the term social network graph, interpret this in the context of the graph data-structures held by social media networks like Facebook. I am not. Nation states themselves hold advanced social network graph data-structures of their own, into which they have the capacity to feed all manner of information they obtain about you. There is very strong evidence that even in the so-called ‘Free World’, these major commercial players actively collaborate with the State, from which the State is able to construct meta-graphs comprising unforeseen amounts of personal information — even if it is ‘just’ metadata.

Much of the truly insightful information to be obtained from advanced graph analysis is not based upon local neighbourhood analysis (i.e you and the dude you just messaged), but by global analysis (i.e based collectively upon the information contained across the entire social graph, and its multitude of highly nontrivial interrelationships, most of which are not at all obvious upon inspection, that no human would ever conceive of taking into consideration, but advanced algorithms systemically will, without discrimination).

Machine learning is oblivious to laws or expectations against demographic profiling — racial, gender, political, medical history, or otherwise. Even if that data isn’t fed directly into the system (e.g via legal barriers), it’ll surely figure it out via correlation, with almost certainty.

For example, with access to someone’s Facebook connections, with current freely-available software libraries, an unsophisticated programmer could infer much about your demographics, political orientation, gender, occupation, and much more, with a high degree of accuracy in most cases, even if no information of the sort were provided on your profile directly. This elementary level of analysis can be implemented in 5 minutes with a few lines of code using modern tools and libraries. Needless to say, national signals intelligence (SIGINT) agencies have somewhat greater resources at their disposal than to hire a summer student for half an hour.

The Russian influence campaign prior to the Trump election is alleged to have actively exploited this exact kind of analysis, with an especial focus on demographic analysis (including geographic, gender, voting, ideological, group membership, and racial profiling), to create a highly individually targeted advertising campaign, whereby the political ads targeted against you may be of an entirely different nature to the ones received by the guy next door, individually calculated to maximise psychological response accordingly.

This technology demonstrably provides direct avenues for population control, in particular via psychological operations (PSYOPs). If it can be exploited against us by adversarial states, it can be exploited against us by our own. I work off the assumption that all states, including our own, are adversarial.

The political chaos this has created in the United States (regardless of whether or not the Russian influence campaign actually changed the election outcome) is testament to the legitimacy of the power and efficacy of these computational techniques. There would be little uproar over the incident, were it not regarded as being entirely plausible.

Similarly, it’s no secret at all that political parties in democratic countries rely on these kinds of analytic techniques for the purposes of vote optimisation, policy targeting, and gerrymandering. Indeed, commercial companies license specialised software optimisation tools to political operatives for exactly this purpose — it’s an entire highly successful business model. If politicians utilise this before entering office, you can be sure they’ll continue to do so upon entering office — with astronomically enhanced resources at their disposal, backed by the full power of the State, and the information it has access to.

It goes without saying that since the tools for performing these kinds of analyses are available to everybody, anywhere in the world, as freely downloadable software packages, and the entire business model of corporations like Google and Facebook is based upon it (and to be clear, that is the business model, into which astronomical resources are invested), what major nation state intelligence apparatuses (especially the joint power of the Five Eyes network, to which my home country of Australia belongs) have at their disposal, extends these capabilities into unimaginable territory, given the resources they have at their disposal (both computational and in terms of access to information).

The National Security Agency (NSA) of the United States, their primary signals intelligence agency, is allegedly the world’s largest employer of mathematicians, and also possesses incredible computing infrastructure. Historically, cryptography (a highly specialised mathematical discipline) has been the primary focus of this. In the era of machine learning, and what can be gained from it from an intelligence perspective, you can place bets this now forms a major component of their interest in advanced mathematics.

But the applications for this are not only applicable to catching terrorists (how many terrorist attacks have actually taken place on American soil to justify investments of this magnitude?). There is good reason that China is now a leading investor into AI technology, given their highly integrated social credit scheme, which has very little to do with terrorism, and far more to do with population control.

We cannot become a political replicate of the People’s Republic of China. But we probably are. In the same way that the Chinese people are in denial, so are we.

“But this couldn’t happen here! We are a ‘democracy’ after all?”

It’s now publicly well known that the NSA has a history of illegally obtaining and storing information on American citizens within their systems, in direct violation of the United States Constitution. It’s impossible to know via national security secrecy laws in what capacity it has been utilised, but the potential for misuse has utterly devastating implications for American citizens and the constitutional rights they believe in.

When applied from the nation state’s point of view (democracy or dictatorship, regardless), where the overriding objective is population control and the centralisation of power, the primary tool at their disposal is, and always has been, to manipulate and subjugate the people. With this kind of advanced analytic power at their disposal, their ability to do so is in fantastically post-Hitlerian territory. If Stalin were alive today, he would not subscribe to pornography magazines, he would subscribe to the Journal of Machine Learning, and spend evenings when his wife was absent, wearing nothing but underwear in front of a computer screen, salivating over the implementation of China’s social credit scheme, and its highly-integrated nationwide information feed, built upon a massive-scale computational backend, employing the techniques described above with almost certainty. He could have expanded the gulags tenfold.

But any kind of computational analysis necessarily requires input data upon which to perform the analysis. There are few computations one can perform upon nothing.

We have a responsibility to provide the State with nothing!
Let them eat cake.
Better yet, let them starve.
And may their graphs become disjoint.

When the State obtains information about your interactions, it adds information to their social graph. What Google and Facebook have on you, which many express grave concern about, is nothing by comparison. The enhancement of this data structure does not just compromise you, but all of society. It compromises our freedom itself.

Having prosperity and a quality of life is not freedom — Hitler achieved overwhelming support because the people thought so. Freedom means that at no point in time, under any circumstance, can they take it all away from us, or make the threat to do so. This is not the case, it never was, it likely never will be, yet it must be so.

In the interests of the formation of a Free State, and inhibiting the ever-increasing expansion of the police state, the extrapolation of which is complete subservience, we have a collective responsibility to:

  • Understand and inform ourselves about digital security, including encryption and anonymity.
  • Ensure full utilisation of these technologies by default.
  • Be aware of the extent of what modern machine learning techniques can reveal about ourselves, others, and all of society.
  • Be aware of how such collective information can be used against us by the State, assuming the worst case scenario by default.
  • Employ reliable and strong end-to-end encryption technologies, where possible, no matter how obscure our communications.
  • Conceal our identities during communications where possible.
  • Provide the State with nothing beyond what is reasonably necessary.
  • Oppose unnecessary forms of government data acquisition.
  • Oppose the integration of government databases and information systems.
  • Enforce ethical separations in data-sharing between government departments.
  • Legislate criminal offences for government entities misusing or falsely obtaining personal data.
  • Offer full support — financial, via contribution to development, or otherwise — to worthy open-source initiatives seeking to facilitate these objectives.
  • Do not trust ‘closed’ (e.g commercial or state-backed) security, relying instead on reputable open-source options. Commercial enterprises necessarily comply with the regulations and laws of the State (regardless which). To the contrary, open-source development is inherently transparent, offering full disclosure and openness.
  • Work off the assumption that any attempts by the State to seek backdoors, prohibit, or in any way compromise the above, to be a direct attempt to subvert freedom and work towards the formation of a totalitarian state.
  • Metadata contributes to the State’s social graph. Even in the absence of content, it establishes identities, relationships, timestamps, and geographic locations. These contribute enormously to correlation analysis with other information in the graph.
  • Be absolutely clear that any political statements to the effect of “we’re only able to access metadata” are made with full knowledge and absolute understanding of the implications of the above, and are deeply cynical ploys. They would otherwise not seek access to it.
  • Any political statements seeking any kind of justification based on highly emotive words such as terrorism, safety, protection, national security, stopping drugs, or catching pedophiles, should be treated with contempt by default, and assumed to be calculated attempts, via emotional manipulation, to subvert a free society and centralise power.
  • Modern history should be made a mandatory subject throughout primary and secondary education, in which case nothing stated here is even mildly controversial or requires any further substantiation. For this reason, no references are provided in this post.

All technologies can be used for good or for evil. There has not been a technology in history to which this hasn’t applied. The establishment of the internet itself faced enormous political opposition for highly overlapping reasons. Needless to say, the internet has been one of the most positively influential technological advances in human history, and one of the most individually liberating and empowering tools ever invented — an overwhelming force for freedom of information, expression, and unprecedented prosperity and opportunity across the globe.

I genuinely believe that the biggest threat humanity faces — far beyond terrorism, drugs, or pedophiles — is the power of the State.

History overwhelmingly substantiates this belief, and despite acknowledging the downsides, I make no apologies for offering my unwavering support to the crypto movement and what it seeks to achieve. The power asymmetry in the world has never been tilted in favour of terrorists — it always was, and always will be, in favour of the State, historically the greatest terrorists of them all.

One of the deepest and most eternally meaningful statements ever made in the history of political philosophy, came from one of its most nefarious actors,

“Voice or no voice, the people can always be brought to the bidding of the leaders. That is easy. All you have to do is tell them they are being attacked, and denounce the pacifists for lack of patriotism and exposing the country to danger. It works the same in any country.”

— Hermann Göring

August 28, 2019

Matt StrasslerThe New York Times Remembers A Great Physicist

The untimely and sudden deaths of Steve Gubser and Ann Nelson, two of the United States’ greatest talents in the theoretical physics of particles, fields and strings, has cast a pall over my summer and that of many of my colleagues.

I have not been finding it easy to write a proper memorial post for Ann, who was by turns my teacher, mentor, co-author, and faculty colleague.  I would hope to convey to those who never met her what an extraordinary scientist and person she was, but my spotty memory banks aren’t helping. Eventually I’ll get it done, I’m sure.

(Meanwhile I am afraid I cannot write something similar for Steve, as I really didn’t know him all that well. I hope someone who knew him better will write about his astonishing capabilities and his unique personality, and I’d be more than happy to link to it from here.)

In this context, I’m gratified to see that the New York Times has given Ann a substantive obituary, https://www.nytimes.com/2019/08/26/science/ann-nelson-dies.html, and appearing in the August 28th print edition, I’m told. It contains a striking (but, to those of us who knew her, not surprising) quotation from Howard Georgi.  Georgi is a professor at Harvard who is justifiably famous as the co-inventor, with Nobel-winner Sheldon Glashow, of Grand Unified Theories (in which the electromagnetic, weak nuclear, and strong nuclear force all emerge from a single force.) He describes Ann, his former student, as being able to best him at his own game.

  • “I have had many fabulous students who are better than I am at many things. Ann was the only student I ever had who was better than I am at what I do best, and I learned more from her than she learned from me.”

He’s being a little modest, perhaps. But not much. There’s no question that Ann was an all-star.

And for that reason, I do have to complain about one thing in the Times obituary. It says “Dr. Nelson stood out in the world of physics not only because she was a woman, but also because of her brilliance.”

Really, NYTimes, really?!?

Any scientist who knew Ann would have said this instead: that Professor Nelson stood out in the world of physics for exceptional brilliance — lightning-fast, sharp, creative and careful, in the same league as humanity’s finest thinkers — and for remarkable character — kind, thoughtful, even-keeled, rigorous, funny, quirky, dogged, supportive, generous. Like most of us, Professor Nelson had a gender, too, which was female. There are dozens of female theoretical physicists in the United States; they are a too-small minority, but they aren’t rare. By contrast, a physicist and person like Ann Nelson, of any gender? They are extremely few in number across the entire planet, and they certainly do stand out.

But with that off my chest, I have no other complaints. (Well, admittedly the physics in the obit is rather garbled, but we can get that straight another time.) Mainly I am grateful that the Times gave Ann fitting public recognition, something that she did not actively seek in life. Her death is an enormous loss for theoretical physics, for many theoretical physicists, and of course for many other people. I join all my colleagues in extending my condolences to her husband, our friend and colleague David B. Kaplan, and to the rest of her family.

August 25, 2019

John PreskillQuantum conflict resolution

If only my coauthors and I had quarreled.

I was working with Tony Bartolotta, a PhD student in theoretical physics at Caltech, and Jason Pollack, a postdoc in cosmology at the University of British Columbia. They acted as the souls of consideration. We missed out on dozens of opportunities to bicker—about the paper’s focus, who undertook which tasks, which journal to submit to, and more. Bickering would have spiced up the story behind our paper, because the paper concerns disagreement.

Quantum observables can disagree. Observables are measurable properties, such as position and momentum. Suppose that you’ve measured a quantum particle’s position and obtained an outcome x. If you measure the position immediately afterward, you’ll obtain x again. Suppose that, instead of measuring the position again, you measure the momentum. All the possible outcomes have equal probabilities of obtaining. You can’t predict the outcome.

The particle’s position can have a well-defined value, or the momentum can have a well-defined value, but the observables can’t have well-defined values simultaneously. Furthermore, if you measure the position, you randomize the outcome of a momentum measurement. Position and momentum disagree.

Tug-of-war

How should we quantify the disagreement of two quantum observables, \hat{A} and \hat{B}? The question splits physicists into two camps. Pure quantum information (QI) theorists use uncertainty relations, whereas condensed-matter and high-energy physicists prefer out-of-time-ordered correlators. Let’s meet the camps in turn.

Heisenberg intuited an uncertainty relation that Robertson formalized during the 1920s,

\Delta \hat{A} \, \Delta \hat{B} \geq \frac{1}{i \hbar} \langle [\hat{A}, \hat{B}] \rangle.

Imagine preparing a quantum state | \psi \rangle and measuring \hat{A}, then repeating this protocol in many trials. Each trial has some probability p_a of yielding the outcome a. Different trials will yield different a’s. We quantify the spread in a values with the standard deviation \Delta \hat{A} = \sqrt{ \langle \psi | \hat{A}^2 | \psi \rangle - \langle \psi | \hat{A} | \psi \rangle^2 }. We define \Delta \hat{B} analogously. \hbar denotes Planck’s constant, a number that characterizes our universe as the electron’s mass does. 

[\hat{A}, \hat{B}] denotes the observables’ commutator. The numbers that we use in daily life commute: 7 \times 5 = 5 \times 7. Quantum numbers, or operators, represent \hat{A} and \hat{B}. Operators don’t necessarily commute. The commutator [\hat{A}, \hat{B}] = \hat{A} \hat{B} - \hat{B} \hat{A} represents how little \hat{A} and \hat{B} resemble 7 and 5. 

Robertson’s uncertainty relation means, “If you can predict an \hat{A} measurement’s outcome precisely, you can’t predict a \hat{B} measurement’s outcome precisely, and vice versa. The uncertainties must multiply to at least some number. The number depends on how much \hat{A} fails to commute with \hat{B}.” The higher an uncertainty bound (the greater the inequality’s right-hand side), the more the operators disagree.

fistfight-cloud

Heisenberg and Robertson explored operator disagreement during the 1920s. They wouldn’t have seen eye to eye with today’s QI theorists. For instance, QI theorists consider how we can apply quantum phenomena, such as operator disagreement, to information processing. Information processing includes cryptography. Quantum cryptography benefits from operator disagreement: An eavesdropper must observe, or measure, a message. The eavesdropper’s measurement of one observable can “disturb” a disagreeing observable. The message’s sender and intended recipient can detect the disturbance and so detect the eavesdropper.

How efficiently can one perform an information-processing task? The answer usually depends on an entropy H, a property of quantum states and of probability distributions. Uncertainty relations cry out for recasting in terms of entropies. So QI theorists have devised entropic uncertainty relations, such as

H (\hat{A}) + H( \hat{B} ) \geq - \log c. \qquad (^*)

The entropy H( \hat{A} ) quantifies the difficulty of predicting the outcome a of an \hat{A} measurement. H( \hat{B} ) is defined analogously. c is called the overlap. It quantifies your ability to predict what happens if you prepare your system with a well-defined \hat{A} value, then measure \hat{B}. For further analysis, check out this paper. Entropic uncertainty relations have blossomed within QI theory over the past few years. 

Blossom

Pure QI theorists, we’ve seen, quantify operator disagreement with entropic uncertainty relations. Physicists at the intersection of condensed matter and high-energy physics prefer out-of-time-ordered correlators (OTOCs). I’ve blogged about OTOCs so many times, Quantum Frontiers regulars will be able to guess the next two paragraphs. 

Consider a quantum many-body system, such as a chain of qubits. Imagine poking one end of the system, such as by flipping the first qubit upside-down. Let the operator \hat{W} represent the poke. Suppose that the system evolves chaotically for a time t afterward, the qubits interacting. Information about the poke spreads through many-body entanglement, or scrambles.

Spin chain

Imagine measuring an observable \hat{V} of a few qubits far from the \hat{W} qubits. A little information about \hat{W} migrates into the \hat{V} qubits. But measuring \hat{V} reveals almost nothing about \hat{W}, because most of the information about \hat{W} has spread across the system. \hat{V} disagrees with \hat{W}, in a sense. Actually, \hat{V} disagrees with \hat{W}(t). The (t) represents the time evolution.

The OTOC’s smallness reflects how much \hat{W}(t) disagrees with \hat{V} at any instant t. At early times t \gtrsim 0, the operators agree, and the OTOC \approx 1. At late times, the operators disagree loads, and the OTOC \approx 0.

Dove

Different camps of physicists, we’ve seen, quantify operator disagreement with different measures: Today’s pure QI theorists use entropic uncertainty relations. Condensed-matter and high-energy physicists use OTOCs. Trust physicists to disagree about what “quantum operator disagreement” means.

I want peace on Earth. I conjectured, in 2016 or so, that one could reconcile the two notions of quantum operator disagreement. One must be able to prove an entropic uncertainty relation for scrambling, wouldn’t you think?

You might try substituting \hat{W}(t) for the \hat{A} in Ineq. {(^*)}, and \hat{V} for the \hat{B}. You’d expect the uncertainty bound to tighten—the inequality’s right-hand side to grow—when the system scrambles. Scrambling—the condensed-matter and high-energy-physics notion of disagreement—would coincide with a high uncertainty bound—the pure-QI-theory notion of disagreement. The two notions of operator disagreement would agree. But the bound I’ve described doesn’t reflect scrambling. Nor do similar bounds that I tried constructing. I banged my head against the problem for about a year.

Handshake

The sky brightened when Jason and Tony developed an interest in the conjecture. Their energy and conversation enabled us to prove an entropic uncertainty relation for scrambling, published this month.1 We tested the relation in computer simulations of a qubit chain. Our bound tightens when the system scrambles, as expected: The uncertainty relation reflects the same operator disagreement as the OTOC. We reconciled two notions of quantum operator disagreement.

As Quantum Frontiers regulars will anticipate, our uncertainty relation involves weak measurements and quasiprobability distributions: I’ve been studying their roles in scrambling over the past three years, with colleagues for whose collaborations I have the utmost gratitude. I’m grateful to have collaborated with Tony and Jason. Harmony helps when you’re tackling (quantum operator) disagreement—even if squabbling would spice up your paper’s backstory.

 

1Thanks to Communications Physics for publishing the paper. For pedagogical formatting, read the arXiv version. 

August 23, 2019

Terence TaoQuantitative bounds for critically bounded solutions to the Navier-Stokes equations

I’ve just uploaded to the arXiv my paper “Quantitative bounds for critically bounded solutions to the Navier-Stokes equations“, submitted to the proceedings of the Linde Hall Inaugural Math Symposium. (I unfortunately had to cancel my physical attendance at this symposium for personal reasons, but was still able to contribute to the proceedings.) In recent years I have been interested in working towards establishing the existence of classical solutions for the Navier-Stokes equations

\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p

\displaystyle \nabla \cdot u = 0

that blow up in finite time, but this time for a change I took a look at the other side of the theory, namely the conditional regularity results for this equation. There are several such results that assert that if a certain norm of the solution stays bounded (or grows at a controlled rate), then the solution stays regular; taken in the contrapositive, they assert that if a solution blows up at a certain finite time {T_*}, then certain norms of the solution must also go to infinity. Here are some examples (not an exhaustive list) of such blowup criteria:

  • (Leray blowup criterion, 1934) If {u} blows up at a finite time {T_*}, and {3 < p \leq \infty}, then {\liminf_{t \rightarrow T_*} (T_* - t)^{\frac{1}{2}-\frac{3}{2p}} \| u(t) \|_{L^p_x({\bf R}^3)} \geq c} for an absolute constant {c>0}.
  • (ProdiSerrinLadyzhenskaya blowup criterion, 1959-1967) If {u} blows up at a finite time {T_*}, and {3 < p \leq \infty}, then {\| u \|_{L^q_t L^p_x([0,T_*) \times {\bf R}^3)} =+\infty}, where {\frac{1}{q} := \frac{1}{2} - \frac{3}{2p}}.
  • (Beale-Kato-Majda blowup criterion, 1984) If {u} blows up at a finite time {T_*}, then {\| \omega \|_{L^1_t L^\infty_x([0,T_*) \times {\bf R}^3)} = +\infty}, where {\omega := \nabla \times u} is the vorticity.
  • (Kato blowup criterion, 1984) If {u} blows up at a finite time {T_*}, then {\liminf_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} \geq c} for some absolute constant {c>0}.
  • (Escauriaza-Seregin-Sverak blowup criterion, 2003) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}.
  • (Seregin blowup criterion, 2012) If {u} blows up at a finite time {T_*}, then {\lim_{t \rightarrow T_*} \|u(t) \|_{L^3_x({\bf R}^3)} = +\infty}.
  • (Phuc blowup criterion, 2015) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{L^{3,q}_x({\bf R}^3)} = +\infty} for any {q < \infty}.
  • (Gallagher-Koch-Planchon blowup criterion, 2016) If {u} blows up at a finite time {T_*}, then {\limsup_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty} for any {3 < p, q < \infty}.
  • (Albritton blowup criterion, 2016) If {u} blows up at a finite time {T_*}, then {\lim_{t \rightarrow T_*} \|u(t) \|_{\dot B_{p,q}^{-1+3/p}({\bf R}^3)} = +\infty} for any {3 < p, q < \infty}.

My current paper is most closely related to the Escauriaza-Seregin-Sverak blowup criterion, which was the first to show a critical (i.e., scale-invariant, or dimensionless) spatial norm, namely {L^3_x({\bf R}^3)}, had to become large. This result now has many proofs; for instance, many of the subsequent blowup criterion results imply the Escauriaza-Seregin-Sverak one as a special case, and there are also additional proofs by Gallagher-Koch-Planchon (building on ideas of Kenig-Koch), and by Dong-Du. However, all of these proofs rely on some form of a compactness argument: given a finite time blowup, one extracts some suitable family of rescaled solutions that converges in some weak sense to a limiting solution that has some additional good properties (such as almost periodicity modulo symmetries), which one can then rule out using additional qualitative tools, such as unique continuation and backwards uniqueness theorems for parabolic heat equations. In particular, all known proofs use some version of the backwards uniqueness theorem of Escauriaza, Seregin, and Sverak. Because of this reliance on compactness, the existing proofs of the Escauriaza-Seregin-Sverak blowup criterion are qualitative, in that they do not provide any quantitative information on how fast the {\|u(t)\|_{L^3_x({\bf R}^3)}} norm will go to infinity (along a subsequence of times).

On the other hand, it is a general principle that qualitative arguments established using compactness methods ought to have quantitative analogues that replace the use of compactness by more complicated substitutes that give effective bounds; see for instance these previous blog posts for more discussion. I therefore was interested in trying to obtain a quantitative version of this blowup criterion that gave reasonably good effective bounds (in particular, my objective was to avoid truly enormous bounds such as tower-exponential or Ackermann function bounds, which often arise if one “naively” tries to make a compactness argument effective). In particular, I obtained the following triple-exponential quantitative regularity bounds:

Theorem 1 If {u} is a classical solution to Navier-Stokes on {[0,T) \times {\bf R}^3} with

\displaystyle \| u \|_{L^\infty_t L^3_x([0,T) \times {\bf R}^3)} \leq A \ \ \ \ \ (1)

 

for some {A \geq 2}, then

\displaystyle | \nabla^j u(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+1}{2}}

and

\displaystyle | \nabla^j \omega(t,x)| \leq \exp\exp\exp(A^{O(1)}) t^{-\frac{j+2}{2}}

for {(t,x) \in [0,T) \times {\bf R}^3} and {j=0,1}.

As a corollary, one can now improve the Escauriaza-Seregin-Sverak blowup criterion to

\displaystyle \limsup_{t \rightarrow T_*} \frac{\|u(t) \|_{L^3_x({\bf R}^3)}}{(\log\log\log \frac{1}{T_*-t})^c} = +\infty

for some absolute constant {c>0}, which to my knowledge is the first (very slightly) supercritical blowup criterion for Navier-Stokes in the literature.

The proof uses many of the same quantitative inputs as previous arguments, most notably the Carleman inequalities used to establish unique continuation and backwards uniqueness theorems for backwards heat equations, but also some additional techniques that make the quantitative bounds more efficient. The proof focuses initially on points of concentration of the solution, which we define as points {(t_0,x_0)} where there is a frequency {N_0} for which one has the bound

\displaystyle |N_0^{-1} P_{N_0} u(t_0,x_0)| \geq A^{-C_0} \ \ \ \ \ (2)

 

for a large absolute constant {C_0}, where {P_{N_0}} is a Littlewood-Paley projection to frequencies {\sim N_0}. (This can be compared with the upper bound of {O(A)} for the quantity on the left-hand side that follows from (1).) The factor of {N_0^{-1}} normalises the left-hand side of (2) to be dimensionless (i.e., critical). The main task is to show that the dimensionless quantity {t_0 N_0^2} cannot get too large; in particular, we end up establishing a bound of the form

\displaystyle t_0 N_0^2 \lesssim \exp\exp\exp A^{O(C_0^6)}

from which the above theorem ends up following from a routine adaptation of the local well-posedness and regularity theory for Navier-Stokes.

The strategy is to show that any concentration such as (2) when {t_0 N_0^2} is large must force a significant component of the {L^3_x} norm of {u(t_0)} to also show up at many other locations than {x_0}, which eventually contradicts (1) if one can produce enough such regions of non-trivial {L^3_x} norm. (This can be viewed as a quantitative variant of the “rigidity” theorems in some of the previous proofs of the Escauriaza-Seregin-Sverak theorem that rule out solutions that exhibit too much “compactness” or “almost periodicity” in the {L^3_x} topology.) The chain of causality that leads from a concentration (2) at {(t_0,x_0)} to significant {L^3_x} norm at other regions of the time slice {t_0 \times {\bf R}^3} is somewhat involved (though simpler than the much more convoluted schemes I initially envisaged for this argument):

  1. Firstly, by using Duhamel’s formula, one can show that a concentration (2) can only occur (with {t_0 N_0^2} large) if there was also a preceding concentration

    \displaystyle |N_1^{-1} P_{N_1} u(t_1,x_1)| \geq A^{-C_0} \ \ \ \ \ (3)

     

    at some slightly previous point {(t_1,x_1)} in spacetime, with {N_1} also close to {N_0} (more precisely, we have {t_1 = t_0 - A^{-O(C_0^3)} N_0^{-2}}, {N_1 = A^{O(C_0^2)} N_0}, and {x_1 = x_0 + O( A^{O(C_0^4)} N_0^{-1})}). This can be viewed as a sort of contrapositive of a “local regularity theorem”, such as the ones established by Caffarelli, Kohn, and Nirenberg. A key point here is that the lower bound {A^{-C_0}} in the conclusion (3) is precisely the same as the lower bound in (2), so that this backwards propagation of concentration can be iterated.

  2. Iterating the previous step, one can find a sequence of concentration points

    \displaystyle |N_n^{-1} P_{N_n} u(t_n,x_n)| \geq A^{-C_0} \ \ \ \ \ (4)

     

    with the {(t_n,x_n)} propagating backwards in time; by using estimates ultimately resulting from the dissipative term in the energy identity, one can extract such a sequence in which the {t_0-t_n} increase geometrically with time, the {N_n} are comparable (up to polynomial factors in {A}) to the natural frequency scale {(t_0-t_n)^{-1/2}}, and one has {x_n = x_0 + O( |t_0-t_n|^{1/2} )}. Using the “epochs of regularity” theory that ultimately dates back to Leray, and tweaking the {t_n} slightly, one can also place the times {t_n} in intervals {I_n} (of length comparable to a small multiple of {|t_0-t_n|}) in which the solution is quite regular (in particular, {u, \nabla u, \omega, \nabla \omega} enjoy good {L^\infty_t L^\infty_x} bounds on {I_n \times {\bf R}^3}).

  3. The concentration (4) can be used to establish a lower bound for the {L^2_t L^2_x} norm of the vorticity {\omega} near {(t_n,x_n)}. As is well known, the vorticity obeys the vorticity equation

    \displaystyle \partial_t \omega = \Delta \omega - (u \cdot \nabla) \omega + (\omega \cdot \nabla) u.

    In the epoch of regularity {I_n \times {\bf R}^3}, the coefficients {u, \nabla u} of this equation obey good {L^\infty_x} bounds, allowing the machinery of Carleman estimates to come into play. Using a Carleman estimate that is used to establish unique continuation results for backwards heat equations, one can propagate this lower bound to also give lower {L^2} bounds on the vorticity (and its first derivative) in annuli of the form {\{ (t,x) \in I_n \times {\bf R}^3: R \leq |x-x_n| \leq R' \}} for various radii {R,R'}, although the lower bounds decay at a gaussian rate with {R}.

  4. Meanwhile, using an energy pigeonholing argument of Bourgain (which, in this Navier-Stokes context, is actually an enstrophy pigeonholing argument), one can locate some annuli {\{ x \in {\bf R}^3: R \leq |x-x_n| \leq R'\}} where (a slightly normalised form of) the entrosphy is small at time {t=t_n}; using a version of the localised enstrophy estimates from a previous paper of mine, one can then propagate this sort of control forward in time, obtaining an “annulus of regularity” of the form {\{ (t,x) \in [t_n,t_0] \times {\bf R}^3: R_n \leq |x-x_n| \leq R'_n\}} in which one has good estimates; in particular, one has {L^\infty_x} type bounds on {u, \nabla u, \omega, \nabla \omega} in this cylindrical annulus.
  5. By intersecting the previous epoch of regularity {I_n \times {\bf R}^3} with the above annulus of regularity, we have some lower bounds on the {L^2} norm of the vorticity (and its first derivative) in the annulus of regularity. Using a Carleman estimate first introduced by Escauriaza, Seregin, and Sverak, as well as a second application of the Carleman estimate used previously, one can then propagate this lower bound back up to time {t=t_0}, establishing a lower bound for the vorticity on the spatial annulus {\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}}. By some basic Littlewood-Paley theory one can parlay this lower bound to a lower bound on the {L^3} norm of the velocity {u}; crucially, this lower bound is uniform in {n}.
  6. If {t_0 N_0^2} is very large (triple exponential in {A}!), one can then find enough scales {n} with disjoint {\{ (t_0,x): R_n \leq |x-x_n| \leq R'_n \}} annuli that the total lower bound on the {L^3_x} norm of {u(t_0)} provided by the above arguments is inconsistent with (1), thus establishing the claim.

The chain of causality is summarised in the following image:

scheme

It seems natural to conjecture that similar triply logarithmic improvements can be made to several of the other blowup criteria listed above, but I have not attempted to pursue this question. It seems difficult to improve the triple logarithmic factor using only the techniques here; the Bourgain pigeonholing argument inevitably costs one exponential, the Carleman inequalities cost a second, and the stacking of scales at the end to contradict the {L^3} upper bound costs the third.

 

August 22, 2019

Terence TaoSymmetric functions in a fractional number of variables, and the multilinear Kakeya conjecture

Let {\Omega} be some domain (such as the real numbers). For any natural number {p}, let {L(\Omega^p)_{sym}} denote the space of symmetric real-valued functions {F^{(p)}: \Omega^p \rightarrow {\bf R}} on {p} variables {x_1,\dots,x_p \in \Omega}, thus

\displaystyle F^{(p)}(x_{\sigma(1)},\dots,x_{\sigma(p)}) = F^{(p)}(x_1,\dots,x_p)

for any permutation {\sigma: \{1,\dots,p\} \rightarrow \{1,\dots,p\}}. For instance, for any natural numbers {k,p}, the elementary symmetric polynomials

\displaystyle e_k^{(p)}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} x_{i_1} \dots x_{i_k}

will be an element of {L({\bf R}^p)_{sym}}. With the pointwise product operation, {L(\Omega^p)_{sym}} becomes a commutative real algebra. We include the case {p=0}, in which case {L(\Omega^0)_{sym}} consists solely of the real constants.

Given two natural numbers {k,p}, one can “lift” a symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} of {k} variables to a symmetric function {[F^{(k)}]_{k \rightarrow p} \in L(\Omega^p)_{sym}} of {p} variables by the formula

\displaystyle [F^{(k)}]_{k \rightarrow p}(x_1,\dots,x_p) = \sum_{1 \leq i_1 < i_2 < \dots < i_k \leq p} F^{(k)}(x_{i_1}, \dots, x_{i_k})

\displaystyle = \frac{1}{k!} \sum_\pi F^{(k)}( x_{\pi(1)}, \dots, x_{\pi(k)} )

where {\pi} ranges over all injections from {\{1,\dots,k\}} to {\{1,\dots,p\}} (the latter formula making it clearer that {[F^{(k)}]_{k \rightarrow p}} is symmetric). Thus for instance

\displaystyle [F^{(1)}(x_1)]_{1 \rightarrow p} = \sum_{i=1}^p F^{(1)}(x_i)

\displaystyle [F^{(2)}(x_1,x_2)]_{2 \rightarrow p} = \sum_{1 \leq i < j \leq p} F^{(2)}(x_i,x_j)

and

\displaystyle e_k^{(p)}(x_1,\dots,x_p) = [x_1 \dots x_k]_{k \rightarrow p}.

Also we have

\displaystyle [1]_{k \rightarrow p} = \binom{p}{k} = \frac{p(p-1)\dots(p-k+1)}{k!}.

With these conventions, we see that {[F^{(k)}]_{k \rightarrow p}} vanishes for {p=0,\dots,k-1}, and is equal to {F} if {k=p}. We also have the transitivity

\displaystyle [F^{(k)}]_{k \rightarrow p} = \frac{1}{\binom{p-k}{p-l}} [[F^{(k)}]_{k \rightarrow l}]_{l \rightarrow p}

if {k \leq l \leq p}.

The lifting map {[]_{k \rightarrow p}} is a linear map from {L(\Omega^k)_{sym}} to {L(\Omega^p)_{sym}}, but it is not a ring homomorphism. For instance, when {\Omega={\bf R}}, one has

\displaystyle [x_1]_{1 \rightarrow p} [x_1]_{1 \rightarrow p} = (\sum_{i=1}^p x_i)^2 \ \ \ \ \ (1)

 

\displaystyle = \sum_{i=1}^p x_i^2 + 2 \sum_{1 \leq i < j \leq p} x_i x_j

\displaystyle = [x_1^2]_{1 \rightarrow p} + 2 [x_1 x_2]_{1 \rightarrow p}

\displaystyle \neq [x_1^2]_{1 \rightarrow p}.

In general, one has the identity

\displaystyle [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow p} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow p} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (2)

 

\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow p}

for all natural numbers {k,l,p} and {F^{(k)} \in L(\Omega^k)_{sym}}, {G^{(l)} \in L(\Omega^l)_{sym}}, where {\pi, \rho} range over all injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}} with {\pi(\{1,\dots,k\}) \cup \rho(\{1,\dots,l\}) = \{1,\dots,m\}}. Combinatorially, the identity (2) follows from the fact that given any injections {\tilde \pi: \{1,\dots,k\} \rightarrow \{1,\dots,p\}} and {\tilde \rho: \{1,\dots,l\} \rightarrow \{1,\dots,p\}} with total image {\tilde \pi(\{1,\dots,k\}) \cup \tilde \rho(\{1,\dots,l\})} of cardinality {m}, one has {k,l \leq m \leq k+l}, and furthermore there exist precisely {m!} triples {(\pi, \rho, \sigma)} of injections {\pi: \{1,\dots,k\} \rightarrow \{1,\dots,m\}}, {\rho: \{1,\dots,l\} \rightarrow \{1,\dots,m\}}, {\sigma: \{1,\dots,m\} \rightarrow \{1,\dots,p\}} such that {\tilde \pi = \sigma \circ \pi} and {\tilde \rho = \sigma \circ \rho}.

Example 1 When {\Omega = {\bf R}}, one has

\displaystyle [x_1 x_2]_{2 \rightarrow p} [x_1]_{1 \rightarrow p} = [\frac{1}{2! 1!}( 2 x_1^2 x_2 + 2 x_1 x_2^2 )]_{2 \rightarrow p} + [\frac{1}{2! 1!} 6 x_1 x_2 x_3]_{3 \rightarrow p}

\displaystyle = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow p} + [3x_1 x_2 x_3]_{3 \rightarrow p}

which is just a restatement of the identity

\displaystyle (\sum_{i < j} x_i x_j) (\sum_k x_k) = \sum_{i<j} x_i^2 x_j + x_i x_j^2 + \sum_{i < j < k} 3 x_i x_j x_k.

Note that the coefficients appearing in (2) do not depend on the final number of variables {p}. We may therefore abstract the role of {p} from the law (2) by introducing the real algebra {L(\Omega^*)_{sym}} of formal sums

\displaystyle F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

where for each {k}, {F^{(k)}} is an element of {L(\Omega^k)_{sym}} (with only finitely many of the {F^{(k)}} being non-zero), and with the formal symbol {[]_{k \rightarrow *}} being formally linear, thus

\displaystyle [F^{(k)}]_{k \rightarrow *} + [G^{(k)}]_{k \rightarrow *} := [F^{(k)} + G^{(k)}]_{k \rightarrow *}

and

\displaystyle c [F^{(k)}]_{k \rightarrow *} := [cF^{(k)}]_{k \rightarrow *}

for {F^{(k)}, G^{(k)} \in L(\Omega^k)_{sym}} and scalars {c \in {\bf R}}, and with multiplication given by the analogue

\displaystyle [F^{(k)}(x_1,\dots,x_k)]_{k \rightarrow *} [G^{(l)}(x_1,\dots,x_l)]_{l \rightarrow *} = \sum_{k,l \leq m \leq k+l} \frac{1}{k! l!} \ \ \ \ \ (3)

 

\displaystyle [\sum_{\pi, \rho} F^{(k)}(x_{\pi(1)},\dots,x_{\pi(k)}) G^{(l)}(x_{\rho(1)},\dots,x_{\rho(l)})]_{m \rightarrow *}

of (2). Thus for instance, in this algebra {L(\Omega^*)_{sym}} we have

\displaystyle [x_1]_{1 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2]_{1 \rightarrow *} + 2 [x_1 x_2]_{2 \rightarrow *}

and

\displaystyle [x_1 x_2]_{2 \rightarrow *} [x_1]_{1 \rightarrow *} = [x_1^2 x_2 + x_1 x_2^2]_{2 \rightarrow *} + [3 x_1 x_2 x_3]_{3 \rightarrow *}.

Informally, {L(\Omega^*)_{sym}} is an abstraction (or “inverse limit”) of the concept of a symmetric function of an unspecified number of variables, which are formed by summing terms that each involve only a bounded number of these variables at a time. One can check (somewhat tediously) that {L(\Omega^*)_{sym}} is indeed a commutative real algebra, with a unit {[1]_{0 \rightarrow *}}. (I do not know if this algebra has previously been studied in the literature; it is somewhat analogous to the abstract algebra of finite linear combinations of Schur polynomials, with multiplication given by a Littlewood-Richardson rule. )

For natural numbers {p}, there is an obvious specialisation map {[]_{* \rightarrow p}} from {L(\Omega^*)_{sym}} to {L(\Omega^p)_{sym}}, defined by the formula

\displaystyle [\sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}]_{* \rightarrow p} := \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}.

Thus, for instance, {[]_{* \rightarrow p}} maps {[x_1]_{1 \rightarrow *}} to {[x_1]_{1 \rightarrow p}} and {[x_1 x_2]_{2 \rightarrow *}} to {[x_1 x_2]_{2 \rightarrow p}}. From (2) and (3) we see that this map {[]_{* \rightarrow p}: L(\Omega^*)_{sym} \rightarrow L(\Omega^p)_{sym}} is an algebra homomorphism, even though the maps {[]_{k \rightarrow *}: L(\Omega^k)_{sym} \rightarrow L(\Omega^*)_{sym}} and {[]_{k \rightarrow p}: L(\Omega^k)_{sym} \rightarrow L(\Omega^p)_{sym}} are not homomorphisms. By inspecting the {p^{th}} component of {L(\Omega^*)_{sym}} we see that the homomorphism {[]_{* \rightarrow p}} is in fact surjective.

Now suppose that we have a measure {\mu} on the space {\Omega}, which then induces a product measure {\mu^p} on every product space {\Omega^p}. To avoid degeneracies we will assume that the integral {\int_\Omega \mu} is strictly positive. Assuming suitable measurability and integrability hypotheses, a function {F \in L(\Omega^p)_{sym}} can then be integrated against this product measure to produce a number

\displaystyle \int_{\Omega^p} F\ d\mu^p.

In the event that {F} arises as a lift {[F^{(k)}]_{k \rightarrow p}} of another function {F^{(k)} \in L(\Omega^k)_{sym}}, then from Fubini’s theorem we obtain the formula

\displaystyle \int_{\Omega^p} F\ d\mu^p = \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}.

Thus for instance, if {\Omega={\bf R}},

\displaystyle \int_{{\bf R}^p} [x_1]_{1 \rightarrow p}\ d\mu^p = p (\int_{\bf R} x\ d\mu(x)) (\int_{\bf R} \mu)^{p-1} \ \ \ \ \ (4)

 

and

\displaystyle \int_{{\bf R}^p} [x_1 x_2]_{2 \rightarrow p}\ d\mu^p = \binom{p}{2} (\int_{{\bf R}^2} x_1 x_2\ d\mu(x_1) d\mu(x_2)) (\int_{\bf R} \mu)^{p-2}. \ \ \ \ \ (5)

 

On summing, we see that if

\displaystyle F^{(*)} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow *}

is an element of the formal algebra {L(\Omega^*)_{sym}}, then

\displaystyle \int_{\Omega^p} [F^{(*)}]_{* \rightarrow p}\ d\mu^p = \sum_{k=0}^\infty \binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ d\mu)^{p-k}. \ \ \ \ \ (6)

 

Note that by hypothesis, only finitely many terms on the right-hand side are non-zero.

Now for a key observation: whereas the left-hand side of (6) only makes sense when {p} is a natural number, the right-hand side is meaningful when {p} takes a fractional value (or even when it takes negative or complex values!), interpreting the binomial coefficient {\binom{p}{k}} as a polynomial {\frac{p(p-1) \dots (p-k+1)}{k!}} in {p}. As such, this suggests a way to introduce a “virtual” concept of a symmetric function on a fractional power space {\Omega^p} for such values of {p}, and even to integrate such functions against product measures {\mu^p}, even if the fractional power {\Omega^p} does not exist in the usual set-theoretic sense (and {\mu^p} similarly does not exist in the usual measure-theoretic sense). More precisely, for arbitrary real or complex {p}, we now define {L(\Omega^p)_{sym}} to be the space of abstract objects

\displaystyle F^{(p)} = [F^{(*)}]_{* \rightarrow p} = \sum_{k=0}^\infty [F^{(k)}]_{k \rightarrow p}

with {F^{(*)} \in L(\Omega^*)_{sym}} and {[]_{* \rightarrow p}} (and {[]_{k \rightarrow p}} now interpreted as formal symbols, with the structure of a commutative real algebra inherited from {L(\Omega^*)_{sym}}, thus

\displaystyle [F^{(*)}]_{* \rightarrow p} + [G^{(*)}]_{* \rightarrow p} := [F^{(*)} + G^{(*)}]_{* \rightarrow p}

\displaystyle c [F^{(*)}]_{* \rightarrow p} := [c F^{(*)}]_{* \rightarrow p}

\displaystyle [F^{(*)}]_{* \rightarrow p} [G^{(*)}]_{* \rightarrow p} := [F^{(*)} G^{(*)}]_{* \rightarrow p}.

In particular, the multiplication law (2) continues to hold for such values of {p}, thanks to (3). Given any measure {\mu} on {\Omega}, we formally define a measure {\mu^p} on {\Omega^p} with regards to which we can integrate elements {F^{(p)}} of {L(\Omega^p)_{sym}} by the formula (6) (providing one has sufficient measurability and integrability to make sense of this formula), thus providing a sort of “fractional dimensional integral” for symmetric functions. Thus, for instance, with this formalism the identities (4), (5) now hold for fractional values of {p}, even though the formal space {{\bf R}^p} no longer makes sense as a set, and the formal measure {\mu^p} no longer makes sense as a measure. (The formalism here is somewhat reminiscent of the technique of dimensional regularisation employed in the physical literature in order to assign values to otherwise divergent integrals. See also this post for an unrelated abstraction of the integration concept involving integration over supercommutative variables (and in particular over fermionic variables).)

Example 2 Suppose {\mu} is a probability measure on {\Omega}, and {X: \Omega \rightarrow {\bf R}} is a random variable; on any power {\Omega^k}, we let {X_1,\dots,X_k: \Omega^k \rightarrow {\bf R}} be the usual independent copies of {X} on {\Omega^k}, thus {X_j(\omega_1,\dots,\omega_k) := X(\omega_j)} for {(\omega_1,\dots,\omega_k) \in \Omega^k}. Then for any real or complex {p}, the formal integral

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p

can be evaluated by first using the identity

\displaystyle [X_1]_{1 \rightarrow p}^2 = [X_1^2]_{1 \rightarrow p} + 2[X_1 X_2]_{2 \rightarrow p}

(cf. (1)) and then using (6) and the probability measure hypothesis {\int_\Omega\ d\mu = 1} to conclude that

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = \binom{p}{1} \int_{\Omega} X^2\ d\mu + 2 \binom{p}{2} \int_{\Omega^2} X_1 X_2\ d\mu^2

\displaystyle = p (\int_\Omega X^2\ d\mu - (\int_\Omega X\ d\mu)^2) + p^2 (\int_\Omega X\ d\mu)^2

or in probabilistic notation

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^2\ d\mu^p = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2. \ \ \ \ \ (7)

 

For {p} a natural number, this identity has the probabilistic interpretation

\displaystyle \mathbf{E}( X_1 + \dots + X_p)^2 = p \mathbf{Var}(X) + p^2 \mathbf{E}(X)^2 \ \ \ \ \ (8)

 

whenever {X_1,\dots,X_p} are jointly independent copies of {X}, which reflects the well known fact that the sum {X_1 + \dots + X_p} has expectation {p \mathbf{E} X} and variance {p \mathbf{Var}(X)}. One can thus view (7) as an abstract generalisation of (8) to the case when {p} is fractional, negative, or even complex, despite the fact that there is no sensible way in this case to talk about {p} independent copies {X_1,\dots,X_p} of {X} in the standard framework of probability theory.

In this particular case, the quantity (7) is non-negative for every nonnegative {p}, which looks plausible given the form of the left-hand side. Unfortunately, this sort of non-negativity does not always hold; for instance, if {X} has mean zero, one can check that

\displaystyle \int_{\Omega^p} [X_1]_{1 \rightarrow p}^4\ d\mu^p = p \mathbf{Var}(X^2) + p(3p-2) (\mathbf{E}(X^2))^2

and the right-hand side can become negative for {p < 2/3}. This is a shame, because otherwise one could hope to start endowing {L(X^p)_{sym}} with some sort of commutative von Neumann algebra type structure (or the abstract probability structure discussed in this previous post) and then interpret it as a genuine measure space rather than as a virtual one. (This failure of positivity is related to the fact that the characteristic function of a random variable, when raised to the {p^{th}} power, need not be a characteristic function of any random variable once {p} is no longer a natural number: “fractional convolution” does not preserve positivity!) However, one vestige of positivity remains: if {F: \Omega \rightarrow {\bf R}} is non-negative, then so is

\displaystyle \int_{\Omega^p} [F]_{1 \rightarrow p}\ d\mu^p = p (\int_\Omega F\ d\mu) (\int_\Omega\ d\mu)^{p-1}.

One can wonder what the point is to all of this abstract formalism and how it relates to the rest of mathematics. For me, this formalism originated implicitly in an old paper I wrote with Jon Bennett and Tony Carbery on the multilinear restriction and Kakeya conjectures, though we did not have a good language for working with it at the time, instead working first with the case of natural number exponents {p} and appealing to a general extrapolation theorem to then obtain various identities in the fractional {p} case. The connection between these fractional dimensional integrals and more traditional integrals ultimately arises from the simple identity

\displaystyle (\int_\Omega\ d\mu)^p = \int_{\Omega^p}\ d\mu^p

(where the right-hand side should be viewed as the fractional dimensional integral of the unit {[1]_{0 \rightarrow p}} against {\mu^p}). As such, one can manipulate {p^{th}} powers of ordinary integrals using the machinery of fractional dimensional integrals. A key lemma in this regard is

Lemma 3 (Differentiation formula) Suppose that a positive measure {\mu = \mu(t)} on {\Omega} depends on some parameter {t} and varies by the formula

\displaystyle \frac{d}{dt} \mu(t) = a(t) \mu(t) \ \ \ \ \ (9)

 

for some function {a(t): \Omega \rightarrow {\bf R}}. Let {p} be any real or complex number. Then, assuming sufficient smoothness and integrability of all quantities involved, we have

\displaystyle \frac{d}{dt} \int_{\Omega^p} F^{(p)}\ d\mu(t)^p = \int_{\Omega^p} F^{(p)} [a(t)]_{1 \rightarrow p}\ d\mu(t)^p \ \ \ \ \ (10)

 

for all {F^{(p)} \in L(\Omega^p)_{sym}} that are independent of {t}. If we allow {F^{(p)}(t)} to now depend on {t} also, then we have the more general total derivative formula

\displaystyle \frac{d}{dt} \int_{\Omega^p} F^{(p)}(t)\ d\mu(t)^p \ \ \ \ \ (11)

 

\displaystyle = \int_{\Omega^p} \frac{d}{dt} F^{(p)}(t) + F^{(p)}(t) [a(t)]_{1 \rightarrow p}\ d\mu(t)^p,

again assuming sufficient amounts of smoothness and regularity.

Proof: We just prove (10), as (11) then follows by same argument used to prove the usual product rule. By linearity it suffices to verify this identity in the case {F^{(p)} = [F^{(k)}]_{k \rightarrow p}} for some symmetric function {F^{(k)} \in L(\Omega^k)_{sym}} for a natural number {k}. By (6), the left-hand side of (10) is then

\displaystyle \frac{d}{dt} [\binom{p}{k} (\int_{\Omega^k} F^{(k)}\ d\mu(t)^k) (\int_\Omega\ d\mu(t))^{p-k}]. \ \ \ \ \ (12)

 

Differentiating under the integral sign using (9) we have

\displaystyle \frac{d}{dt} \int_\Omega\ d\mu(t) = \int_\Omega\ a(t)\ d\mu(t)

and similarly

\displaystyle \frac{d}{dt} \int_{\Omega^k} F^{(k)}\ d\mu(t)^k = \int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu(t)^k

where {a_1,\dots,a_k} are the standard {k} copies of {a = a(t)} on {\Omega^k}:

\displaystyle a_j(\omega_1,\dots,\omega_k) := a(\omega_j).

By the product rule, we can thus expand (12) as

\displaystyle \binom{p}{k} (\int_{\Omega^k} F^{(k)}(a_1+\dots+a_k)\ d\mu^k ) (\int_\Omega\ d\mu)^{p-k}

\displaystyle + \binom{p}{k} (p-k) (\int_{\Omega^k} F^{(k)}\ d\mu^k) (\int_\Omega\ a\ d\mu) (\int_\Omega\ d\mu)^{p-k-1}

where we have suppressed the dependence on {t} for brevity. Since {\binom{p}{k} (p-k) = \binom{p}{k+1} (k+1)}, we can write this expression using (6) as

\displaystyle \int_{\Omega^p} [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p}\ d\mu^p

where {F^{(k)} \ast a \in L(\Omega^{k+1})_{sym}} is the symmetric function

\displaystyle F^{(k)} \ast a(\omega_1,\dots,\omega_{k+1}) := \sum_{j=1}^{k+1} F^{(k)}(\omega_1,\dots,\omega_{j-1},\omega_{j+1} \dots \omega_{k+1}) a(\omega_j).

But from (2) one has

\displaystyle [F^{(k)} (a_1 + \dots + a_k)]_{k \rightarrow p} + [ F^{(k)} \ast a ]_{k+1 \rightarrow p} = [F^{(k)}]_{k \rightarrow p} [a]_{1 \rightarrow p}

and the claim follows. \Box

Remark 4 It is also instructive to prove this lemma in the special case when {p} is a natural number, in which case the fractional dimensional integral {\int_{\Omega^p} F^{(p)}\ d\mu(t)^p} can be interpreted as a classical integral. In this case, the identity (10) is immediate from applying the product rule to (9) to conclude that

\displaystyle \frac{d}{dt} d\mu(t)^p = [a(t)]_{1 \rightarrow p} d\mu(t)^p.

One could in fact derive (10) for arbitrary real or complex {p} from the case when {p} is a natural number by an extrapolation argument; see the appendix of my paper with Bennett and Carbery for details.

Let us give a simple PDE application of this lemma as illustration:

Proposition 5 (Heat flow monotonicity) Let {u: [0,+\infty) \times {\bf R}^d \rightarrow {\bf R}} be a solution to the heat equation {u_t = \Delta u} with initial data {\mu_0} a rapidly decreasing finite non-negative Radon measure, or more explicitly

\displaystyle u(t,x) = \frac{1}{(4\pi t)^{d/2}} \int_{{\bf R}^d} e^{-|x-y|^2/4t}\ d\mu_0(y)

for al {t>0}. Then for any {p>0}, the quantity

\displaystyle Q_p(t) := t^{\frac{d}{2} (p-1)} \int_{{\bf R}^d} u(t,x)^p\ dx

is monotone non-decreasing in {t \in (0,+\infty)} for {1 < p < \infty}, constant for {p=1}, and monotone non-increasing for {0 < p < 1}.

Proof: By a limiting argument we may assume that {d\mu_0} is absolutely continuous, with Radon-Nikodym derivative a test function; this is more than enough regularity to justify the arguments below.

For any {(t,x) \in (0,+\infty) \times {\bf R}^d}, let {\mu(t,x)} denote the Radon measure

\displaystyle d\mu(t,x)(y) := \frac{1}{(4\pi)^{d/2}} e^{-|x-y|^2/4t}\ d\mu_0(y).

Then the quantity {Q_p(t)} can be written as a fractional dimensional integral

\displaystyle Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx.

Observe that

\displaystyle \frac{\partial}{\partial t} d\mu(t,x) = \frac{|x-y|^2}{4t^2} d\mu(t,x)

and thus by Lemma 3 and the product rule

\displaystyle \frac{d}{dt} Q_p(t) = -\frac{d}{2t} Q_p(t) + t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [\frac{|x-y|^2}{4t^2}]_{1 \rightarrow p} d\mu(t,x)^p\ dx \ \ \ \ \ (13)

 

where we use {y} for the variable of integration in the factor space {{\bf R}^d} of {({\bf R}^d)^p}.

To simplify this expression we will take advantage of integration by parts in the {x} variable. Specifically, in any direction {x_j}, we have

\displaystyle \frac{\partial}{\partial x_j} d\mu(t,x) = -\frac{x_j-y_j}{2t} d\mu(t,x)

and hence by Lemma 3

\displaystyle \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}\ d\mu(t,x)^p\ dx = - \int_{({\bf R}^d)^p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

Multiplying by {x_j} and integrating by parts, we see that

\displaystyle d Q_p(t) = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} x_j [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

where we use the Einstein summation convention in {j}. Similarly, if {F_j(y)} is any reasonable function depending only on {y}, we have

\displaystyle \frac{\partial}{\partial x_j} \int_{({\bf R}^d)^p}[F_j(y)]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

\displaystyle = - \int_{({\bf R}^d)^p} [F_j(y)]_{1 \rightarrow p} [\frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx

and hence on integration by parts

\displaystyle 0 = \int_{{\bf R}^d} \int_{({\bf R}^d)^p} [F_j(y) \frac{x_j-y_j}{2t}]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

We conclude that

\displaystyle \frac{d}{2t} Q_p(t) = t^{-d/2} \int_{{\bf R}^d} \int_{({\bf R}^d)^p} (x_j - [F_j(y)]_{1 \rightarrow p}) [\frac{(x_j-y_j)}{4t}]_{1 \rightarrow p} d\mu(t,x)^p\ dx

and thus by (13)

\displaystyle \frac{d}{dt} Q_p(t) = \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{({\bf R}^d)^p}

\displaystyle [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p} - (x_j - [F_j(y)]_{1 \rightarrow p}) [x_j - y_j]_{1 \rightarrow p}\ d\mu(t,x)^p\ dx.

The choice of {F_j} that then achieves the most cancellation turns out to be {F_j(y) = \frac{1}{p} y_j} (this cancels the terms that are linear or quadratic in the {x_j}), so that {x_j - [F_j(y)]_{1 \rightarrow p} = \frac{1}{p} [x_j - y_j]_{1 \rightarrow p}}. Repeating the calculations establishing (7), one has

\displaystyle \int_{({\bf R}^d)^p} [(x_j-y_j)(x_j-y_j)]_{1 \rightarrow p}\ d\mu^p = p \mathop{\bf E} |x-Y|^2 (\int_{{\bf R}^d}\ d\mu)^{p}

and

\displaystyle \int_{({\bf R}^d)^p} [x_j-y_j]_{1 \rightarrow p} [x_j-y_j]_{1 \rightarrow p}\ d\mu^p

\displaystyle = (p \mathbf{Var}(x-Y) + p^2 |\mathop{\bf E} x-Y|^2) (\int_{{\bf R}^d}\ d\mu)^{p}

where {Y} is the random variable drawn from {{\bf R}^d} with the normalised probability measure {\mu / \int_{{\bf R}^d}\ d\mu}. Since {\mathop{\bf E} |x-Y|^2 = \mathbf{Var}(x-Y) + |\mathop{\bf E} x-Y|^2}, one thus has

\displaystyle \frac{d}{dt} Q_p(t) = (p-1) \frac{1}{4t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \mathbf{Var}(x-Y) (\int_{{\bf R}^d}\ d\mu)^{p}\ dx. \ \ \ \ \ (14)

 

This expression is clearly non-negative for {p>1}, equal to zero for {p=1}, and positive for {0 < p < 1}, giving the claim. (One could simplify {\mathbf{Var}(x-Y)} here as {\mathbf{Var}(Y)} if desired, though it is not strictly necessary to do so for the proof.) \Box

Remark 6 As with Remark 4, one can also establish the identity (14) first for natural numbers {p} by direct computation avoiding the theory of fractional dimensional integrals, and then extrapolate to the case of more general values of {p}. This particular identity is also simple enough that it can be directly established by integration by parts without much difficulty, even for fractional values of {p}.

A more complicated version of this argument establishes the non-endpoint multilinear Kakeya inequality (without any logarithmic loss in a scale parameter {R}); this was established in my previous paper with Jon Bennett and Tony Carbery, but using the “natural number {p} first” approach rather than using the current formalism of fractional dimensional integration. However, the arguments can be translated into this formalism without much difficulty; we do so below the fold. (To simplify the exposition slightly we will not address issues of establishing enough regularity and integrability to justify all the manipulations, though in practice this can be done by standard limiting arguments.)

— 1. Multilinear heat flow monotonicity —

Before we give a multilinear variant of Proposition 5 of relevance to the multilinear Kakeya inequality, we first need to briefly set up the theory of finite products

\displaystyle \Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}

of fractional powers of spaces {\Omega_1,\dots,\Omega_k}, where {p_1,\dots,p_k} are real or complex numbers. The functions {F^{(p_1,\dots,p_k)}} to integrate here lie in the tensor product space

\displaystyle L(\Omega_1^{p_1})_{sym} \otimes \dots \otimes L(\Omega_k^{p_k})_{sym}, \ \ \ \ \ (15)

 

which is generated by tensor powers

\displaystyle F^{(p_1,\dots,p_k)} = F_1^{(p_1)} \otimes \dots \otimes F_k^{(p_k)}

with {F_j^{(p_j)} \in L(\Omega_j^{p_j})_{sym}}, with the usual tensor product identifications and algebra operations. One can evaluate fractional dimensional integrals of such functions against “virtual product measures” {d\mu_1^{p_1} \dots d\mu_k^{p_k}}, with {\mu_j} a measure on {\Omega_j}, by the natural formula

\displaystyle \int_{\Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}} F_1^{(p_1)} \otimes \dots \otimes F_k^{(p_k)} d\mu_1^{p_1} \dots d\mu_k^{p_k} := \prod_{j=1}^k ( \int_{\Omega_j^{p_j}} F_j^{(p_j)}\ d\mu_j^{p_j} )

assuming sufficient measurability and integrability hypotheses. We can lift functions {F_j^{(m)} \in L(\Omega_j^m)_{sym}} to an element {[F_j^{(m)}]_{m \rightarrow p; j}} of the space (15) by the formula

\displaystyle [F_j^{(m)}]_{m \rightarrow p; j} := 1^{\otimes j-1} \otimes [F_j^{(m)}]_{m \rightarrow p} \otimes 1^{\otimes k-j}.

This is easily seen to be an algebra homomorphism.

Example 7 If {F_1: \Omega_1 \rightarrow {\bf R}} and {F_2: \Omega_2 \rightarrow {\bf R}} are functions and {\mu_1, \mu_2} are measures on {\Omega_1, \Omega_2} respectively, then (assuming sufficient measurability and integrability) then the multiple fractional dimensional integral

\displaystyle \int_{\Omega_1^{p_1} \times \Omega_2^{p_2}} [F_1]_{1 \rightarrow p_1; 1} [F_2]_{1 \rightarrow p_2;2}\ d\mu_1^{p_1} d\mu^{p_2}

is equal to

\displaystyle p_1 (\int_{\Omega_1} F_1\ d\mu_1) (\int_{\Omega_1}\ d\mu_1)^{p_1-1} p_2 (\int_{\Omega_2} F_2\ d\mu_2) (\int_{\Omega_2}\ d\mu_2)^{p_2-2}.

In the case that {p_1,p_2} are natural numbers, one can view the “virtual” integrand {[F_1]_{1 \rightarrow p_1; 1} [F_2]_{1 \rightarrow p_2;2}} here as an actual function on {\Omega_1^{p_1} \times \Omega_2^{p_2}}, namely

\displaystyle (\omega_{1;1},\dots,\omega_{p_1;1}), (\omega_{1;2},\dots,\omega_{p_2;2}) \mapsto \sum_{i_1=1}^{p_1} F_1(\omega_{i_1;1}) \sum_{i_2=1}^{p_2} F_2(\omega_{i_2;2})

in which case the above evaluation of the integral can be achieved classically.

From a routine application of Lemma 3 and various forms of the product rule, we see that if each {\mu_j(t)} varies with respect to a time parameter {t} by the formula

\displaystyle \frac{d}{dt} \mu_j(t) = a_j(t) \mu_j(t)

and {F^{(p_1,\dots,p_k)}(t)} is a time-varying function in (15), then (assuming sufficient regularity and integrability), the time derivative

\displaystyle \frac{d}{dt} \int_{\Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}} F^{(p_1,\dots,p_k)}(t)\ d\mu_1(t)^{p_1} \dots d\mu_k(t)^{p_k}

is equal to

\displaystyle \int_{\Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}} \frac{d}{dt} F^{(p_1,\dots,p_k)}(t) \ \ \ \ \ (16)

 

Now suppose that for each space {\Omega_j} one has a non-negative measure {\mu_j^0}, a vector-valued function {y_j: \Omega_j \rightarrow {\bf R}^d}, and a matrix-valued function {A_j: \Omega_j \rightarrow {\bf R}^{d \times d}} taking values in real symmetric positive semi-definite {d \times d} matrices. Let {p_1,\dots,p_k} be positive real numbers; we make the abbreviations

\displaystyle \vec p := (p_1,\dots,p_k)

\displaystyle \Omega^{\vec p} := \Omega_1^{p_1} \times \dots \times \Omega_k^{p_k}.

For any {t > 0} and {x \in {\bf R}^d}, we define the modified measures

\displaystyle d\mu_j(t,x) := e^{-\pi \langle A_j(x-y_j), (x-y_j) \rangle/t}\ d\mu_j^0

and then the product fractional power measure

\displaystyle d\mu(t,x)^{\vec p} := d\mu(t,x)_1^{p_1} \dots d\mu(t,x)_k^{p_k}.

If we then define the heat-type functions

\displaystyle u_j(t,x) := \int_{{\bf R}^d}\ d\mu_j(t,x) = \int_{{\bf R}^d} e^{-\pi \langle A_j(x-y_j), (x-y_j) \rangle/t}\ d\mu_j^0

(where we drop the normalising power of {t} for simplicity) we see in particular that

\displaystyle \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx = \int_{\Omega^{\vec p}}\ d\mu(t,x)^{\vec p} \ \ \ \ \ (17)

 

hence we can interpret the multilinear integral in the left-hand side of (17) as a product fractional dimensional integral. (We remark that in my paper with Bennett and Carbery, a slightly different parameterisation is used, replacing {x} with {t x}, and also replacing {t} with {1/t}.)

If the functions {A_j: \Omega_j \rightarrow {\bf R}^{d \times d}} were constant in {\Omega_j}, then the functions {u_j(t,x)} would obey some heat-type partial differential equation, and the situation is now very analogous to Proposition 5 (and is also closely related to Brascamp-Lieb inequalities, as discussed for instance in this paper of Carlen, Lieb, and Loss, or this paper of mine with Bennett, Carbery, and Christ). However, for applications to the multilinear Kakeya inequality, we permit {A_j} to vary slightly in the {\Omega_j} variable, and now the {u_j} do not directly obey any PDE.

A naive extension of Proposition 5 would then seek to establish monotonicity of the quantity (17). While such monotonicity is available in the “Brascamp-Lieb case” of constant {A_j}, as discussed in the above papers, this does not quite seem to be to be true for variable {A_j}. To fix this problem, a weight is introduced in order to avoid having to take matrix inverses (which are not always available in this algebra). On the product fractional dimensional space {\Omega^{\vec p}}, we have a matrix-valued function {A_*} defined by

\displaystyle A_* := \sum_{j=1}^k [A_j]_{1 \rightarrow p_j; j}.

The determinant {\mathrm{det}(A_*)} is then a scalar element of the algebra (15). We then define the quantity

\displaystyle Q_{\vec p}(t) := t^{-d/2} \int_{\Omega^{\vec p}}\mathrm{det}(A_*)\ d\mu(t,x)^{\vec p}. \ \ \ \ \ (18)

 

Example 8 Suppose we take {k=2} and let {p_1,p_2} be natural numbers. Then {A_*} can be viewed as the {2 \times 2}-matrix valued function

\displaystyle A_*(\omega_{1;1},\dots,\omega_{p_1;1},\omega_{1;2},\dots,\omega_{p_2;2}) = \sum_{i=1}^{p_1} A_1(\omega_{i;1}) + \sum_{i=1}^{p_2} A_2(\omega_{i;2}).

By slight abuse of notation, we write the determinant {\mathrm{det}(A)} of a {2 \times 2} matrix as {X \wedge Y}, where {X} and {Y} are the first and second rows of {A}. Then

\displaystyle \mathrm{det}(A_*) = \sum_{1 \leq i,i' \leq p_1} X_1(\omega_{i;1}) \wedge Y_1(\omega_{i';1})

\displaystyle + \sum_{i=1}^{p_1} \sum_{i'=1}^{p_2} X_1(\omega_{i;1}) \wedge Y_2(\omega_{i';2}) + X_2(\omega_{i';2}) \wedge Y_1(\omega_{i;1})

\displaystyle + \sum_{1 \leq i,i' \leq p_2} X_2(\omega_{i;2}) \wedge Y_2(\omega_{i';2})

and after some calculation, one can then write {Q_{\vec p}(t)} as

\displaystyle p t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_1} X_1 \wedge Y_1\ d\mu_1(t,x)) u_1(t,x)^{p_1-1} u_2(t,x)^{p_2}\ dx

\displaystyle + p(p-1)t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_1^2} X_1(\omega_1) \wedge Y_1(\omega_2)\ d\mu^2_1(t,x)(\omega_1,\omega_2))

\displaystyle u_1(t,x)^{p_1-2} u_2(t,x)^{p_2}\ dx

\displaystyle + p^2t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_1} X_1\ d\mu_1(t,x) \wedge \int_{\Omega_2} Y_2\ d\mu_2(t,x)

\displaystyle + \int_{\Omega_2} X_2\ d\mu_2(t,x) \wedge \int_{\Omega_1} Y_1\ d\mu_1(t,x)) u_1(t,x)^{p_1-1} u_2(t,x)^{p_2-1}\ dx

\displaystyle p t^{-d/2}\int_{{\bf R}^d} (\int_{\Omega_2} X_2 \wedge Y_2\ d\mu_2(t,x)) u_1(t,x)^{p_1} u_2(t,x)^{p_2-1}\ dx

\displaystyle + p(p-1)t^{-d/2} \int_{{\bf R}^d} (\int_{\Omega_2^2} X_2(\omega_1) \wedge Y_2(\omega_2)\ d\mu^2_2(t,x)(\omega_1,\omega_2))

\displaystyle u_1(t,x)^{p_1} u_2(t,x)^{p_2-2}\ dx.

By a polynomial extrapolation argument, this formula is then also valid for fractional values of {p}; this can also be checked directly from the definitions after some tedious computation. Thus we see that while the compact-looking fractional dimensional integral (18) can be expressed in terms of more traditional integrals, the formulae get rather messy, even in the {d=2} case. As such, the fractional dimensional calculus (based heavily on derivative identities such as (16)) gives a more convenient framework to manipulate these otherwise quite complicated expressions.

Suppose the functions {A_j: \Omega_j \rightarrow {\bf R}^{d \times d}} are close to constant {d \times d} matrices {A_j^0 \in {\bf R}^{d \times d}}, in the sense that

\displaystyle A_j = A_j^0 + O(\varepsilon) \ \ \ \ \ (19)

 

uniformly on {\Omega_j} for some small {\varepsilon>0} (where we use for instance the operator norm to measure the size of matrices, and we allow implied constants in the {O()} notation to depend on {d, \vec p}, and the {A_j^0}). Then we can write {A_j = A_j^0 + \varepsilon C_j} for some bounded matrix {C_j}, and then we can write

\displaystyle A_* = \sum_{j=1}^k [A_j^0]_{1 \rightarrow p_j;j} + \varepsilon [B_j]_{1 \rightarrow p_j;j} = \sum_{j=1}^k p_j A_j^0 + \varepsilon \sum_{j=1}^k [B_j^0]_{1 \rightarrow p_j;j}.

We can therefore write

\displaystyle \mathrm{det}(A_*) = \mathrm{det}(A_*^0) + \varepsilon B_*

where {A_*^0 := \sum_{j=1}^k p_j A_j^0} and the coefficients of the matrix {C_*} are some polynomial combination of the coefficients of {[C_j^0]_{1 \rightarrow p_j;j}}, with all coefficients in this polynomial of bounded size. As a consequence, and on expanding out all the fractional dimensional integrals, one obtains a formula of the form

\displaystyle Q_{\vec p}(t) = t^{-d/2} (\mathrm{det}(A_*^0) + O(\varepsilon)) \int_{\Omega^{\vec p}} \ d\mu(t,x)^{\vec p}

\displaystyle = t^{-d/2} (\mathrm{det}(A_*^0) + O(\varepsilon)) \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx .

Thus, as long as {A_*^0} is strictly positive definite and {\varepsilon} is small enough, this quantity {Q_{\vec p}(t)} is comparable to the classical integral

\displaystyle t^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx.

Now we compute the time derivative of {Q_{\vec p}(t)}. We have

\displaystyle \frac{\partial}{\partial t} \mu_j(t,x) = \frac{\pi}{t^2} \langle A_j(x-y_j),(x-y_j) \rangle \mu_j(t,x)

so by (16), one can write {\frac{d}{dt} Q_{\vec p}(t)} as

\displaystyle -\frac{d}{2} Q_{\vec p}(t) + \frac{\pi}{t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \ \ \ \ \ (20)

 

\displaystyle \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(x-y_j),(x-y_j) \rangle]_{1 \rightarrow p_j;j}\ d\mu(t,x)^{\vec p}\ dx

where we use {y_j} as the coordinate for the copy of {{\bf R}^d} that is being lifted to {({\bf R}^d)^{p_j}}.

As before, we can take advantage of some cancellation in this expression using integration by parts. Since

\displaystyle \frac{\partial}{\partial x_i} \mu_j(t,x) = -\frac{2\pi}{t} \langle A_j(x-y_j), e_i\rangle \mu_j(t,x)

where {e_1,\dots,e_d} are the standard basis for {{\bf R}^d}, we see from (16) and integration by parts that

\displaystyle d Q_{\vec p}(t) = \frac{2\pi}{t^{\frac{d}{2}+1}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}}\mathrm{det}(A_*) \sum_{j=1}^k x_i [\langle A_j(x-y_j), e_i\rangle]_{1 \rightarrow p_j;j}\ d\mu(t,x)^{\vec p}

with the usual summation conventions on the index {i}. Also, similarly to before, we suppose we have an element {F_i} of (15) for each {i} that does not depend on {x}, then by (16) and integration by parts

\displaystyle \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \sum_{j=1}^k F_i [\langle A_j(x-y_j), e_i\rangle]_{1 \rightarrow p_j;j}\ d\mu(t,x)^{\vec p} = 0

or, writing {F = (F_1,\dots,F_d)},

\displaystyle \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \sum_{j=1}^k \langle [A_j(x-y_j)]_{1 \rightarrow p_j;j}, F \rangle\ d\mu(t,x)^{\vec p} = 0.

We can thus write (20) as

\displaystyle \frac{d}{dt} Q_{\vec p}(t) = \frac{\pi}{t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} G\ d\mu(t,x)^{\vec p}\ dx \ \ \ \ \ (21)

 

where {G = G(x)} is the element of (15) given by

\displaystyle G := \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(x-y_j),(x-y_j) \rangle]_{1 \rightarrow p_j;j} \ \ \ \ \ (22)

 

\displaystyle - \langle \sum_{j=1}^k [A_j(x-y_j)]_{1 \rightarrow p_j;j}, \mathrm{det}(A_*) x - F \rangle.

The terms in {G} that are quadratic in {x} cancel. The linear term can be rearranged as

\displaystyle \langle x, A_* F - \mathrm{det}(A_*) \sum_{j=1}^k [A_j y_j]_{1 \rightarrow p_j; j} \rangle.

To cancel this, one would like to set {F} equal to

\displaystyle F = A_*^{-1}\mathrm{det}(A_*) \sum_{j=1}^k [A_j y_j]_{1 \rightarrow p_j; j} .

Now in the commutative algebra (15), the inverse {A_*^{-1}} does not necessarily exist. However, because of the weight factor {\mathrm{det}(A_*)}, one can work instead with the adjugate matrix {\mathrm{adj}(A_*)}, which is such that {\mathrm{adj}(A_*) A_* = A_* \mathrm{adj}(A_*) \mathrm{det}(A_*) I} where {I} is the identity matrix. We therefore set {F} equal to the expression

\displaystyle F := \mathrm{adj}(A_*) \sum_{j=1}^k [A_j y_j]_{1 \rightarrow p_j; j}

and now the expression in (22) does not contain any linear or quadratic terms in {x}. In particular it is completely independent of {x}, and thus we can write

\displaystyle G = \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(\overline{y}-y_j),(\overline{y}-y_j) \rangle]_{1 \rightarrow p_j;j}

\displaystyle - \langle \sum_{j=1}^k [A_j(\overline{y}-y_j)]_{1 \rightarrow p_j;j}, \mathrm{det}(A_*) \overline{y} - F \rangle

where {\overline{y} = \overline{y}(t,x)} is an arbitrary element of {{\bf R}^d} that we will select later to obtain a useful cancellation. We can rewrite this a little as

\displaystyle G = \mathrm{det}(A_*) \sum_{j=1}^k [\langle A_j(\overline{y}-y_j),(\overline{y}-y_j) \rangle]_{1 \rightarrow p_j;j}

\displaystyle - \langle \sum_{j=1}^k [A_j(\overline{y}-y_j)]_{1 \rightarrow p_j;j}, \mathrm{adj}(A_*) \sum_{j'=1}^k [A_j(\overline{y} - y_j)]_{1 \rightarrow p_j; j} \rangle.

If we now introduce the matrix functions

\displaystyle B_j := A_j^{1/2}

and the vector functions

\displaystyle w_j := B_j( \overline{y} - y_j)

then this can be rewritten as

\displaystyle G = \mathrm{det}(A_*) \sum_{j=1}^k [\|w_j\|^2]_{1 \rightarrow p_j;j}

\displaystyle - \langle \sum_{j=1}^k [B_j w_j]_{1 \rightarrow p_j;j}, \mathrm{adj}(A_*) \sum_{j'=1}^k [B_j w_j]_{1 \rightarrow p_j; j} \rangle.

Similarly to (19), suppose that we have

\displaystyle B_j = B_j^0 + O(\varepsilon)

uniformly on {\Omega_j}, where {B_j^0 := (A_j^0)^{1/2}}, thus we can write

\displaystyle B_j = B_j^0 + \varepsilon D_j \ \ \ \ \ (23)

 

for some bounded matrix-valued functions {D_j}. Inserting this into the previous expression (and expanding out {A_*} appropriately) one can eventually write

\displaystyle G = G^0 + \varepsilon H

where

\displaystyle G^0 = \mathrm{det}(A^0_*) (\sum_{j=1}^k [\|w_j\|^2]_{1 \rightarrow p_j;j}

\displaystyle - \langle \sum_{j=1}^k B^0_j [w_j]_{1 \rightarrow p_j;j}, (A^0_*)^{-1} \sum_{j'=1}^k B^0_j [w_j]_{1 \rightarrow p_j; j} \rangle )

and {H} is some polynomial combination of the {D_j} and {w_j} (or more precisely, of the quantities {[D_j]_{1 \rightarrow p_j;j}}, {[w_j]_{1 \rightarrow p_j;j}}, {[D_j w_j]_{1 \rightarrow p_j;j}}, {[\|w_j\|^2]_{1 \rightarrow p_j;j}}) that is quadratic in the {w_j} variables, with bounded coefficients. As a consequence, after expanding out the product fractional dimensional integrals and applying some Cauchy-Schwarz to control cross-terms, we have

\displaystyle \frac{d}{dt} Q_{\vec p}(t) = \frac{\pi}{t^{\frac{d}{2}+2}} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} G^0\ d\mu(t,x)^{\vec p}\ dx

\displaystyle + O( \varepsilon t^{-\frac{d}{2}+2} \int_{{\bf R}^d} \int_{\Omega^{\vec p}} \sum_{j=1}^k [\| w_j \|^2]_{1 \rightarrow p_j;j}d\mu(t,x)^{\vec p}\ dx).

Now we simplify {G^0}. We let

\displaystyle \overline{w_j} := \frac{\int_{\Omega_j} w_j\ d\mu_j}{\int_{\Omega_j}\ d\mu_j}

be the average value of {\overline{w_j}}; for each {t,x} this is just a vector in {{\bf R}^d}. We then split {w_j = \overline{w_j} + (w_j - \overline{w_j})}, leading to the identities

\displaystyle [\|w_j\|^2]_{1 \rightarrow p_j;j} = p_j \|\overline{w_j}\|^2 + 2 \langle \overline{w_j}, [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}\rangle

\displaystyle + [\| w_j - \overline{w_j} \|^2]_{1 \rightarrow p_j;j}

and

\displaystyle \sum_{j=1}^k B^0_j [w_j]_{1 \rightarrow p_j;j} = \sum_{j=1}^k p_j B^0_j \overline{w_j} + \sum_{j=1}^k B^0_j [(w_j - \overline{w_j})]_{1 \rightarrow p_j;j}.

The term {\sum_{j=1}^k p_j B^0_j \overline{w_j}} is problematic, but we can eliminate it as follows. By construction one has (supressing the dependence on {t,x})

\displaystyle \sum_{j=1}^k p_j B^0_j \overline{w_j} \int_{\Omega^p} \ d\mu^{\vec p} = \int_{\Omega^p} \sum_{j=1}^k B^0_j [w_j]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}

\displaystyle = \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j(\overline{y} - y_j)]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}

\displaystyle = \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j]_{1 \rightarrow p_j;j} \overline{y} - \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j y_j]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}.

By construction, one has

\displaystyle \int_{\Omega^p}\sum_{j=1}^k B^0_j [B_j]_{1 \rightarrow p_j;j}\ d\mu^{\vec p} = (\sum_{j=1}^k p B_0^j B_0^j + O(\varepsilon)) \int_{\Omega^p}\ d\mu^{\vec p}

\displaystyle = (A^0_* + O(\varepsilon)) \int_{\Omega^p}\ d\mu^{\vec p}.

Thus if {A^0_*} is positive definite and {\varepsilon} is small enough, this matrix is invertible, and we can choose {\overline{y}} so that the expression {\sum_{j=1}^k p_j B^0_j \overline{w_j} } vanishes. Making this choice, we then have

\displaystyle \sum_{j=1}^k [B^0_j w_j]_{1 \rightarrow p_j;j} = \sum_{j=1}^k [B^0_j (w_j - \overline{w_j})]_{1 \rightarrow p_j;j}.

Observe that the fractional dimensional integral of

\displaystyle \langle \overline{w_j}, [w_j - \overline{w_j}]_{1 \rightarrow p_j;j} \rangle

or

\displaystyle \langle [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}, M [w_{j'} - \overline{w_{j'}}]_{1 \rightarrow p_{j'};j'}

for {j \neq j'} and arbitrary constant matrices {M} against {d\mu^{\vec p}} vanishes. As a consequence, we can now simplify the integral

\displaystyle \int_{\Omega^p} G_0\ d\mu^{\vec p} \ \ \ \ \ (24)

 

as

\displaystyle \mathrm{det}(A^0_*) \int_{\Omega^p} \sum_{j=1}^k p_j \| \overline{w_j}\|^2

\displaystyle + \langle [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) [w_j - \overline{w_j}]_{1 \rightarrow p_j; j} \rangle\ d\mu^{\vec p}.

Using (2), we can split

\displaystyle \langle [w_j - \overline{w_j}]_{1 \rightarrow p_j;j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) [w_j - \overline{w_j}]_{1 \rightarrow p_j; j} \rangle

as the sum of

\displaystyle [\langle w_j - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j - \overline{w_j}) \rangle ]_{1 \rightarrow p_j; j}

and

\displaystyle 2[\langle w_j(\omega_1) - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j(\omega_2) - \overline{w_j}) \rangle ]_{2 \rightarrow p_j; j}.

The latter also integrates to zero by the mean zero nature of {w_j - \overline{w_j}}. Thus we have simplified (24) to

\displaystyle \mathrm{det}(A^0_*) \int_{\Omega^p} \sum_{j=1}^k p_j \| \overline{w_j}\|^2

\displaystyle + [\langle w_j - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j - \overline{w_j}) \rangle ]_{1 \rightarrow p_j; j} \ d\mu^{\vec p}.

Now let us make the key hypothesis that the matrix

\displaystyle 1 - B_0^j (A^0_*)^{-1} B_0^{j}

is strictly positive definite, or equivalently that

\displaystyle \sum_{j'=1}^k p_{j'} A_{j'}^0 > A_j

for all {j=1,\dots,k}, where the ordering is in the sense of positive definite matrices. Then we have the pointwise bound

\displaystyle \langle w_j - \overline{w_j}, (1 - B_0^j (A^0_*)^{-1} B_0^{j}) (w_j - \overline{w_j}) \rangle \gtrsim \| w_j - \overline{w_j} \|^2

and thus

\displaystyle \frac{d}{dt} Q_{\vec p}(t) \gtrsim t^{-\frac{d}{2}+2} \int_{{\bf R}^d} \int_{\Omega^p}

\displaystyle \sum_{j=1}^k [\|\overline{w_j}\|^2 + \| w_j - \overline{w_j} \|^2 - O(\varepsilon) \|w_j\|^2 ]_{1 \rightarrow p_j;j}\ d\mu^{\vec p}\ dx.

For {\varepsilon} small enough, the expression inside the {[]_{1 \rightarrow p_j;j}} is non-negative, and we conclude the monotonicity

\displaystyle \frac{d}{dt} Q_{\vec p}(t) \geq 0.

We have thus proven the following statement, which is essentially Proposition 4.1 of my paper with Bennett and Carbery:

Proposition 9 Let {d,k \geq 1}, let {A_1^0,\dots,A_k^0} be positive semi-definite real symmetric {d \times d} matrices, and let {p_1,\dots,p_k>0} be such that

\displaystyle p_1 A_1^0 + \dots + p_k A_k^0 > A_j^0 \ \ \ \ \ (25)

 

for {j=1,\dots,d}. Then for any positive measure spaces {\Omega_1,\dots,\Omega_k} with measures {\mu_1^0,\dots,\mu_k^0} and any functions {A_j, y_j} on {\Omega_j} with {A_j = A_j^0 + O(\varepsilon)} for a sufficiently small {\varepsilon>0}, the quantity {Q_{\vec p}(t)} is non-decreasing in {t \in (0,+\infty)}, and is also equal to

\displaystyle t^{-d/2} (\mathrm{det}(A_*^0) + O(\varepsilon)) \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx.

In particular, we have

\displaystyle t^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx \lesssim \limsup_{T \rightarrow \infty} T^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(T,x)^{p_j}\ dx

for any {t>0}.

A routine calculation shows that for reasonable choices of {\mu_j^0} (e.g. discrete measures of finite support), one has

\displaystyle \limsup_{T \rightarrow \infty} T^{-d/2} \int_{{\bf R}^d} \prod_{j=1}^k u_j(T,x)^{p_j}\ dx \lesssim \prod_{j=1}^k \mu_j^0({\bf R}^d)^{p_j}

and hence (setting {t=1}) we have

\displaystyle \int_{{\bf R}^d} \prod_{j=1}^k u_j(t,x)^{p_j}\ dx \lesssim \prod_{j=1}^k \mu_j^0({\bf R}^d).

If we choose the {\mu_j^0} to be the sum of {N_j} Dirac masses, and each {A_j^0} to be the diagonal matrix {A_j^0 = I - e_j e_j^T}, then the key condition (25) is obeyed for {p_1=\dots=p_k = p > \frac{1}{d-1}}, and one arrives at the multilinear Kakeya inequality

\displaystyle \int_{{\bf R}^d} \prod_{j=1}^k (\sum_{i=1}^{N_j} 1_{T_{j,i}})^p \lesssim \prod_{j=1}^k N_j^{p_j}

whenever {T_{j,i}} are infinite tubes in {{\bf R}^d} of width {1} and oriented within {\varepsilon} of the basis vector {e_j}, for a sufficiently small absolute constant {\varepsilon}. (The hypothesis on the directions can then be relaxed to a transversality hypothesis by applying some linear transformations and the triangle inequality.)

August 11, 2019

Jordan EllenbergWe are a long way from such sentiments

All of us living at a certain time on this planet together, and together experiencing all its earthly joys and sorrows, seeing the same sky, loving and hating what are, after all, the same things, each and every one of us condemned to suffer the same sentence, the same disappearance off the face of the earth, should really nurture the greatest tenderness towards each other, a feeling of the most heart-rending closeness, and should be literally screaming with terror and pain whenever we are parted by a fate which at any moment is fully capable of transforming every one of our separations, even if only meant to last ten minutes, into an eternal one. But, as you know, for most of the time, we are a long way from such sentiments, and often take leave of even those closest to us in the most thoughtless manner imaginable.

Ivan Bunin, “Long Ago,” 1921 (Sophie Lund, trans.)

August 07, 2019

Cylindrical OnionOne unforgettable week with Nobel laureates

Guest post by Nadezda Chernyavskaya - ETH Zurich

  Bavarian Evening at the Lindau Nobel Laureate Meetings - Picture Credit: Christian Flemming

August 06, 2019

Matt StrasslerA Catastrophic Weekend for Theoretical High Energy Physics

It is beyond belief that not only am I again writing a post about the premature death of a colleague whom I have known for decades, but that I am doing it about two of them.

Over the past weekend, two of the world’s most influential and brilliant theoretical high-energy physicists — Steve Gubser of Princeton University and Ann Nelson of the University of Washington — fell to their deaths in separate mountain accidents, one in the Alps and one in the Cascades.

Theoretical high energy physics is a small community, and within the United States itself the community is tiny.  Ann and Steve were both justifiably famous and highly respected as exceptionally bright lights in their areas of research. Even for those who had not met them personally, this is a stunning and irreplaceable loss of talent and of knowledge.

But most of us did know them personally.  For me, and for others with a personal connection to them, the news is devastating and tragic. I encountered Steve when he was a student and I was a postdoc in the Princeton area, and later helped bring him into a social group where he met his future wife (a great scientist in her own right, and a friend of mine going back decades).  As for Ann, she was one of my teachers at Stanford in graduate school, then my senior colleague on four long scientific papers, and then my colleague (along with her husband David B. Kaplan) for five years at the University of Washington, where she had the office next to mine. I cannot express what a privilege it always was to work with her, learn from her, and laugh with her.

I don’t have the heart or energy right now to write more about this, but I will try to do so at a later time. Right now I join their spouses and families, and my colleagues, in mourning.