Planet Musings

February 09, 2010

Secret Blogging SeminarGrothendieck’s letter


Recently on meta.mathoverflow.net, Harry Gindi pointed out that Laszlo’s webpage for an edition of SGA 4 now contains the message

Alexandre Grothendieck a malheureusement souhaité que cessent les travaux de réédition de SGA. Les pages qui étaient consacrées sont donc closes.

It has since come to light that this request came in the form of a letter, which has been circulating in the French mathematical community for the last month. I include here a link to a typed version of that letter, vouching for neither its authenticity or accuracy, along with my pathetic attempt at translating it, for those few whose French is even worse than mine. Feel free to suggest better translations (I’ll incorporate them here).

Declaration of intent of non-publication

I do not intend to publish or republish any work or text of which I am the author, in any form whatsoever, printed or electronic, whether in full or in excerpts, pieces of personal nature or otherwise, or letters addressed to anybody, and any translation of texts of which I am the author. Any edition or dissemination of such texts which have been made in the past without my consent, or which is made in the future, is against my will expressly specified here and is illegal in my eyes. I will ask readers of such pirate editions or any other publication containing without my permission the texts of my hand (beyond citations of a few lines each) to cease trade of these works, and those responsible for libraries in possession of such works to remove these titles from their libraries.

If my intentions, clearly expressed here, should go unheeded, then the shame of it falls on those responsible for the illegal editions, and those responsible for the libraries concerned (some of the former and of the latter have been informed about my intention).

Written at my home, January 3, 2010,

Alexandre Grothendieck.

Chad OrzelRadio DogPhysics: Northern Great Plains Edition

sm_cover_draft_atom.jpgJust a reminder, I will be on KSOO radio Tuesday evening, 6:30 pm ET, if you'd like to hear about How to Teach Physics to Your Dog on the radio at the end of an extremely long day. If you're in broadcast range of Sioux Falls, SD, tune it in, or you can listen live via their web site.

I'll also be at Boskone this weekend, reading book-related stuff on Sunday morning. If you're in the Boston area, stop by. If you're not, well, there's still no way to experience a convention over the Internet. Sorry.

Read the comments on this post...

Cosmic VarianceFrom Eternity to Book Club: Chapters Four and Five

Welcome to this week’s installment of the From Eternity to Here book club. This week we’re tackling two chapters at once: Chapter Four, “Time is Personal,” and Chapter Five, “Time is Flexible.” That’s just because these chapters are relatively short; next time we’ll return to one chapter per week.

Excerpt:

Starting from a single event in Newtonian spacetime, we were able to define a surface of constant time that spread uniquely throughout the universe, splitting the set of all events into the past and the future (plus “simultaneous” events precisely on the surface). In relativity we can’t do that. Instead, the light cone associated with an event divides spacetime into the past of that event (events inside the past light cone), the future of that event (inside the future light cone), the light cone itself, and a bunch of points outside the light cone that are neither in the past nor the future.

It’s that last bit that really gets people. In our reflexively Newtonian way of thinking about the world, we insist that some far away event either happened in the past, the future, or at the same time as some event on our own world line. In relativity, for spacelike separated events (outside one another’s light cones), the answer is “none of the above.” We could choose to draw some surfaces that sliced through spacetime, and label them “surfaces of constant time,” if we really wanted to. That would be taking advantage of the definition of time as a coordinate on spacetime, as discussed in Chapter One. But the result reflects our personal choice, not a real feature of the universe. In relativity, the concept of “simultaneous faraway events” does not make sense.

These two chapters take on a task that is part of the responsibility of any good book on modern cosmology or gravity: explaining Einstein’s theory of relativity. Both special relativity and general relativity, hence two chapters. In retrospect they are pretty short, so an argument could be made that I should have just combined them into a single chapter.

The special challenge of these chapters is precisely that many readers — but not all — will already have read numerous other popular-level expositions of relativity. But you have to do it. Fortunately, my favorite way of talking about relativity is a little bit different from the standard one, and lines up well with the overarching goal of understanding the meaning of “time.” In particular, I try to make the point that the secret to relativity is to think locally — to compare things happening right next to each other in spacetime, not events that are widely separated. You’re allowed to compare separated events, of course, but the answers are necessarily dependent on arbitrary choices of coordinates, and that leads to endless confusion. So you won’t read a lot about “length contraction” or “time dilation,” but you will read a lot about the actual amount of time measured along a trajectory.

Unfortunately, a search for vivid examples of the maxim “freely-falling paths through spacetime experience the longest amount of proper time” led me directly to the most embarrassing mistake in the book. (At least, “most embarrassing mistake so far uncovered.”) Sordid details below the fold!

The mistake is the claim that a clock that sits stationary on a tower will experience less proper time than a clock that orbits the Earth at the same height above ground. That’s wrong: the orbiting clock will measure less time. This appears in the paragraph at the bottom of page 85 and top of 86, and is elevated from “unfortunate” to “a real doozy” by being illustrated in graphic detail by the figure on page 86. Not really any way I can claim it was just a typo.

sphere.two.geodesics The subtle issue underlying the mistake is illustrated in this figure, which shows two paths connecting two points on a sphere. Both paths are great circles. The shortest distance between two points on a sphere is a great circle; but it certainly doesn’t follow that any path following a great circle gives us the shortest distance between two points. If you go more than half the way around the sphere, you end up with a pretty long path!

The same kind of thing happens in spacetime. The trajectory of longest proper time between two events will always be a freely-falling trajectory (a geodesic). But not every freely-falling path gives us the longest time, and that’s exactly the case in this example. Given two events at the same position above the Earth, the actual path of longest time is a radial freely-falling orbit. If you want your clock to experience the longest time it can, you throw it straight up in the air to where the gravitational field is weaker (and clocks run more quickly with respect to time measured at infinity) and let it fall back down. A circular orbit actually loses time by staying at the same altitude but zipping around the Earth. I relied on my affection for the general underlying principle, and didn’t bother to sit down and work out the actual numbers in this case, so I never found the mistake. Pretty sure my membership in the general relativists’ guild is going to be permanently revoked for this one.

If you’re still not convinced of the wrongness of my example, here’s an equation, the line element along a circular trajectory in the equatorial plane in the Schwarzschild metric:

d\tau^2 = \left(1-\frac{2GM}{r}\right) dt^2 - r^2 d\phi^2\,.
On the left we have a small interval (squared) of the proper time τ, what a clock would measure along some path. The first term on the right is the contribution from our motion with respect to t, the time measured at infinity; for any given amount of t, we experience less proper time τ as our height r decreases and the coefficient (1-2GM/r) becomes smaller. The second term on the right is the contribution from our angular motion φ. Taking the square root of the whole thing and integrating along a path gives you the proper time.

We don’t have to go through the entire calculation to convince ourselves that staying stationary on the tower has a longer proper time than the circular orbit does. Both trajectories get the same contribution from the first term on the right side, while the second term is zero for the clock on the tower (it’s not moving, so =0), but it’s negative for the orbit. So the orbit is definitely less time. To be corrected in the next printing.

The deep point, of course, remains true: the time measured by clocks in general relativity depends on their path through spacetime, and the way to maximize that time is to take a freely-falling path. Just not that one.


Secret Blogging SeminarWhat happened to Clay Liftoff?


Clay has announced that in 2010, there will be no Liftoff Fellows; they say the program is suspended. The title question was asked in MathOverflow a while back, and while it was rightly shut down there, I’m still kind of curious to know the answer. Did Clay decide Liftoff was not a good program for some reason? Did they not want to spend the money? Obviously, I’m appreciative of the Liftoff program having been a Fellow myself, but its very unclear to me that it results in more math getting done, as opposed to having a few mathematicians pay off student loans faster, which I think was its main effect on me.

Chad OrzelNon-Dorky Poll: Time to Rise and Shine

No substantive blogging for you today, as my alarm clock decided not to go off, causing me to oversleep by the hour that I usually spend on bloggy things. So that you're not left without blog-related entertainment, though, here's an appropriate poll topic:

<a href="http://answers.polldaddy.com/poll/2673504/">How early do you have to set your alarm to get to work/class on time?</a><span style="font-size:9px;">(<a href="http://answers.polldaddy.com">polling</a>)</span>

Of course, despite oversleeping by a full hour, I was still here twenty minutes before this morning's lab. And probably a good half-hour before the majority of my students. Their late arrival will do wonders for my mood.

Read the comments on this post...

Chad OrzelLinks for 2010-02-09

Read the comments on this post...

Quantum DiariesL’invasion des réseaux de neurones

Si je vous dis réseau de neurones, vous pensez certainement au cerveau, ou même si vous avez suivi des cours de biologie vous pensez aux synapses, dendrites etc… Mais ce n’est pas là où je veux vous amener. Pour le moment.
Vous êtes vous déjà demandé comment était lu le code postal sur les enveloppes, ou encore comment le filtre anti-spam de votre messagerie préférée faisait pour stopper les mails indésirables ? Tout ceci demande une capacité à effectuer une décision reliée à un processus statistique. En effet, 2 personnes n’écriront jamais le même chiffre de la même manière et deux spams ne contiendront pas exactement les mêmes mots. Nous nous retrouvons face un ensemble d’éléments potentiellement infini tous différents les uns des autres et qui pourtant peuvent se regrouper en un nombre restreint de groupes de même caractéristique (ce caractère est un 3 ou encore ce mail est un spam…).

C’est dans cet objectif de tri que sont utilisés ce qu’on appelle des algorithmes d’apprentissage, dont font partie les réseaux de neurones artificiels. Ceux-ci vont être capable d’apprendre à identifier une certaine caractéristique dans un échantillon qui lui est soumis.

Architecture d'un réseau de neurones

Architecture d'un réseau de neurones

Les réseaux de neurones sont basés sur un modèle simplifié du neurone biologique, ils se composent généralement de neurones d’entrée, puis une couche dite cachée enfin une couche de sortie (voir schéma). Le tout reliés par des synapses. En entrée sont donnés les différents critères utiles au tri (par exemple l’occurrence de certains mots pour l’identification de spams), la sortie est la réponse du réseau (c’est plutôt un spam ou non).
Mathématiquement le principe repose sur le fait que n’importe quelle fonction peut être approximée par une combinaison linéaire de fonctions d’activation ( sigmoïde, tangente hyperbolique ou fonction de Heaviside ). Ainsi chaque neurone se trouve doté de cette fonction et chaque lien entre les neurones (synapse) est pondéré suivant le problème à résoudre.

Un tel réseau est à la base parfaitement stupide, il ne sait rien faire à part un traitement purement aléatoire de l’information. Comme quand vous voulez apprendre à faire quelque chose, il va falloir s’entraîner!
Durant cette étape nous allons soumettre à notre algorithme un échantillon de caractéristiques connues à trier. On pourra ainsi comparer la réponse du réseau à la réponse correcte. Sachant cela, nous pourrons améliorer le résultat en modifiant les poids synaptiques. Après plusieurs essais, le réseau de neurones aura une sortie proche de celle attendue et sera désormais prêt à utiliser ses capacités sur un échantillon quelconque.
L’analogie avec l’apprentissage humain est très fort : imaginez que je doive apprendre à quelqu’un à reconnaître une souris d’ordinateur. Je vais lui présenter plusieurs objets en lui disant à chaque fois si c’est une souris. Si je lui montre un nombre important de souris (diverses et variées), il va au final réussir à repérer les caractéristiques pertinentes et va pouvoir en extrapoler un «concept souris». Après la phase d’apprentissage, la comparaison à ce concept général sera utilise à chaque fois qu’il devra reconnaître une souris :
«Ah d’accord… Une souris est plus ou moins ovale, possède deux boutons et parfois un bouton au milieu, et elle est souvent raccordée par un fil etc…  Donc si je vois toutes ces caractéristiques sur un objet, j’aurai de bonnes chances de présumer que c’est une souris d’ordinateur».

Très bien, mais je suis un peu loin de la physique des particules ici n’est-ce pas? Alors revenons-y.
En physique des particules, le principe critique est de pouvoir discerner un phénomène bien particulier (le signal) au milieu des millions de collisions amenant à des phénomènes qui ne nous intéresse pas (le bruit de fond). Autrement dit, trouver l’aiguille dans la botte de foin… La théorie physique sous-jacente aux phénomènes observés dans les collisionneurs de particules étant la mécanique quantique, nous ne pouvons jamais avec certitude connaître l’issue d’une collision en particulier. Nous ne pouvons donner que les probabilités.
La méthode première pour augmenter nos chances est d’effectuer des «coupures» : je ne regarde que ce qui a une énergie supérieure à un tel seuil ou encore je ne prends que ce qui a été détecté dans une certaine partie du détecteur etc… Car je sais que c’est dans ces cas que j’ai le plus de probabilités de trouver mon bonheur.
C’est exactement ce que va faire un réseau de neurone, mais de manière optimisée, il va, de part son entraînement, apprendre à ne sélectionner que les évènements possédant les caractéristiques qui ont le plus de chance d’être du signal et rejeter tout ce qui a de fortes chances d’être du bruit de fond.
Le sujet de ma thèse est justement de mettre en évidence un phénomène particulier qui fait intervenir le boson de Higgs et de par la même découvrir (ou exclure) son existence. Il faut savoir que ce phénomène a une probabilité extrêmement faible de survenir, il est donc crucial de pouvoir trier ces évènements. C’est pour cela que je travail à l’aide de réseaux de neurones adaptés à la reconnaissance de ce phénomène.

Akinator, une application internet capable de deviner à quoi vous pensez grâce a un algorithme d'apprentissage.

Akinator, une application internet capable de deviner à quoi vous pensez grâce a un algorithme d'apprentissage.

L’intérêt pour les réseaux de neurones et les algorithmes d’apprentissage en général n’a cessé de croître ces 20 dernières années et sont couramment utilisés dans des domaines aussi variés que les milieux financiers (prédiction des fluctuations de marches), dans le domaine bancaire (pour déceler les fraudes aux cartes de crédit), en aéronautique (pilotes automatiques), en intelligence artificielle etc… Même certaines applications internet se vantant de pouvoir lire dans vos pensées ont vu le jour sur la toile comme 20q ou encore Akinator et utilisent ces algorithmes.

Nous pouvons voir que ces nouvelles techniques d’analyse ont un bel avenir devant eux. Au delà des applications sans cesse plus nombreuses, celles-ci s’améliorent de jour en jour grâce au travail des chercheurs et deviennent ainsi plus puissantes, plus rapides et plus précises. Mais comme nous l’avons vu, malgré le mot neurone, nous sommes encore bien loin d’un cerveau humain. Alors avant d’imaginer une invasion de robot tueurs, sachez bien que Terminator sait pour le moment à peine lire et que c’est déjà pas mal!

BackreactionWhy, oh why, is the Psi called Psi?

I'm currently reading Sean Carroll's book "From Eternity to Here" and stumbled over this remark
In Newtonian mechanics, the space of states is called "phase space" for reasons that are pretty mysterious.

A mystery that hadn't occurred to me before, probably because the German word "Zustandsraum" means literally "state space," so no mystery there. Stefan and I were guessing Gibbs, who introduced the word, might have generalized the terminology from the harmonic oscillator where the location in phase space does indeed tell you the phase of the oscillation. (You find a nice applet depicting the phase-space diagram of the damped and undamped oscillator here).

In any case, this caused me to ponder what other words with funny origin physicists like to use. (Both funny ha-ha, and funny peculiar.) Why, for example, is the recombination in the early universe called recombination if there was no prior combination? Not that I was the first to ask that question. Sean offered the explanation that the word is borrowed from nuclear physics. But then why don't nuclear physicists call the fragmentation refragmentation?

There are more interesting nomenclatures though than presence or absence of prefixes.

A particularly well known oddity is the name "quarks" introduced by Gell-Mann, who couldn't decide how to spell the sound ducks make:
In 1963, when I assigned the name "quark" to the fundamental constituents of the nucleon, I had the sound first, without the spelling, which could have been "kwork". Then, in one of my occasional perusals of Finnegans Wake, by James Joyce, I came across the word "quark" in the phrase "Three quarks for Muster Mark". Since "quark" (meaning, for one thing, the cry of the gull) was clearly intended to rhyme with "Mark", as well as "bark" and other such words, I had to find an excuse to pronounce it as "kwork".
~M. Gell-Mann, The Quark and the Jaguar, via Wikipedia

Had Gell-Mann read a German dictionary instead of Joyce, he'd have noticed "Quark" is the German word for a milk product (often mistakenly translated as "cottage cheese" which is something entirely different). Besides this, "Quark" is a frequently used colloquial expression for nonsense.

But at least we know how that word came along. A mystery remained to me why the English adaption of the German word "Eigenvektor" came out to be "eigenvector." The German word "eigen" simply means "innate," and could easily have been translated.

A better example fo imaginative nomenclature is the Psi-particle (now known as J/Psi) whose cloud-chamber pictures frequently have the shape of a Psi (see picture above).

Then there is the "Penguin diagram", which owes its name to a lost bet and some illegal substances, and the "tadpole diagram" which once run risk of turning into a "spermion." Probably a good thing the tadpoles kept their name - just imagine what issues the anti-abortionists would have had with spermion cancellation.

In General Relativity, we have the conjecture of "cosmic censorship" to prevent us from seeing "naked singularities," and "wormholes" are already a classic. Cosmologists have further blessed us with MACHOs and WIMPs, acronyms for MAssive Compact Halo Object and Weakly Interacting Massive Particles respectively. Loop Quantum Gravity features a LOST theorem, after the last names of its authors. The large gap between the energy scale of currently known physics and the scale where grand unification is thought to occur is also known as "desert." We have a seesaw mechanism, play with Mexican hat potentials, have ghosts and talk about stop particles. There's a swiss cheese universe and neutron stars have pasta-antipasta layers with a spaghetti-phase. The most stupid nomenclature I so far have come up with is a "pullover". Yes, I know, not terribly original, but then I didn't expect a Nobelprize for it ;-)

Did I miss something? Leave it in the comments!

February 08, 2010

David Hoggdouble redshifts

Tsalmantza and I spoke about our multiple-redshift search in the SDSS spectroscopy. The new technology we bring is a data-driven model of the spectra; the goal is to increase the number of known lenses. We discussed tests of the model, and the hope that increasing the precision of the model will increase the sensitivity of the system to second (and third) redshifts.

Clifford JohnsonCassandra Wilson

cassandra_wilsonTime for a little music with my nostalgia. I remember my days in Princeton (where I was a postdoc at the Institute for Advanced Study and, later, at the University) particularly well when it comes to certain special things, and one of them was the music I was discovering, and venturing up to New York or down to Philadelphia to see live. The wonderful Cassandra Wilson had just firmly settled into her astonishingly good Blue Note phase at that time, and the (then) newly released album "Blue Light 'Til Dawn" was pure magic to me (and remains so), and was considerably inspiring to me during that time of intense work and during a key period of career and personal development. I went to see her sing at the Theatre of the Living Arts in Philly one wonderful evening. Here she is, (from around that time, I think, or at least it has the right feel), singing the opening song from the album in a slightly shaky live recording. It is a bit [...]

Jordan EllenbergIrrational likred


Deane Yang asks in comments:  “What athletes do you especially like?”  That’s actually what I was going to post about today anyway.  A short list, excluding people who play for teams I follow:  Rickey Henderson.  Manny Ramirez.  Barry Bonds.  Jim Thome.  Nomar Garciaparra.  Edgar Martinez.  Randall Cunningham.  Ricky Williams.  Jake Plummer.  Gus Frerotte.  Surya Bonaly.  Arantxa Sanchez.

Doug NatelsonThis week in cond-mat, SQUID edition

Superconducting quantum interference devices, or SQUIDs, are fascinating gadgets.  Take a superconducting loop with two weak links (e.g., tunnel junctions, or constrictions with a lower critical current).   Now thread magnetic flux through the loop.  The superconducting wavefunction, which includes a phase factor that involves the vector potential, must be single-valued around the loop.  That means that the phase factor must return to itself modulo 2 pi going around the loop. The phase factor is proportional to the line integral of the vector potential, which itself is the magnetic flux through the loop.  Therefore, the total magnetic flux through the loop must be quantized.  If the external magnetic field doesn't give an integer number of flux quanta, then the superconductor must generate screening currents around the loop that produce flux and make up the difference.  If you had connected the loop to an external current source and run that external current (which splits itself around the two branches of the loop) up to the edge of the critical current, you would find that the screening currents would drive the loop normal and lead to a detectable voltage drop that is periodic in magnetic flux through the loop.  This periodicity allows SQUIDs to be phenomenally good magnetic field detectors.  One can integrate a tiny SQUID onto a movable probe, and make a scanning SQUID microscope, and do amazing things like figure out the pairing symmetry of high-Tc superconductors.
This week a paper appeared on the arxiv relevant to scanning SQUID microscopy:


arxiv:1002.1529 - Koshnick et al.,  Design concepts for an improved integrated scanning SQUID
Here, Koshnick, together with scanning SQUID experts Kirtley and Moler, lay out ideas that they have in the works for refining the technology of these gadgets.  Neat stuff.

Almost simultaneously, a new paper appeared in Nano Letters on an implementation of an aluminum scanning SQUID microscope.  The basic concept, involving the use of a drawn optical fiber tip as a template for deposition of an aluminum ring and leads, hearkens back to the scanning single-electron transistor charge detector worked on previously by one of the coauthors.

Tim GowersEDP7 — emergency post


I don’t feel particularly ready for a post at this point, but the previous one has got to 100 comments, so this is one of the quick summaries again — but this one is even shorter than usual.

We are still doing an experimental investigation of multiplicative functions, trying to understand how special they have to be if they have low discrepancy. Ian Martin has produced some beautiful plots of the graphs of partial sums of multiplicative functions generated by various greedy algorithms. See this comment and the ensuing discussion.

Terence Tao has some thoughts about how one might try to reduce to the character-like case.

I came up with a proof strategy that I thought looked promising until I realized that it made predictions that are false for character-like functions such as \lambda_3 and \mu_3. Even if the idea doesn’t solve the problem, I think it may be good for something, so I have written a wiki page about it. Gil has had thoughts of a somewhat similar, but not identical, kind. Here is a related comment of Gil’s, and here are some more amazing plots of Ian’s. (I think we should set up a page on the wiki devoted to these plots and the ideas that led to them.) Regardless of what happens with EDP itself, I think we have some fascinating problems to think about, which can be summed up as, “What is going on with these plots?”

Terence TaoAn epsilon of room: pages from year three of a mathematical blog


I have just finished the first draft of my blog book for 2009, under the title of “An epsilon of room: pages from year three of a mathematical blog“.  It largely follows the format of my previous two blog books, “Structure and Randomness“ and “Poincaré’s legacies“.

There is still some amount of work to be done on the texts; for instance, I need to create an index (which I had neglected to do in the previous two books in the series), and will probably end up splitting the book into two volumes (as was done for “Poincaré’s legacies”).

As always, any feedback or comments are very welcome.

Filed under: book, Mathematics Tagged: mathematical blogging

Lieven le BruynWhere’s Bourbaki’s Escorial?

As explained in the bumpy-road-post, Andre Weil and Evelyne Gillet became involved sometime in 1935. Early 1936, they made a pre-honeymooning trip to Spain and visited El Escorial. Weil was so taken by the place that he planned the next Bourbaki-conference to be held in a nearby college.

However, the Bourbakis never made it to to Spain that summer as the Spanish civil war broke out July 17th, a few weeks before the intended conference. Still, the second Bourbaki-meeting remains often referred to as the ‘Escorial conference’. Can we GEO-tag the exact location of Bourbaki’s “Escorial”?

Claude Chevalley came up with a Plan-B and suggested they would use his parents’ place in Chançay as their venue. Chevalley’s father was a French diplomat and his house sure did possess a matching ‘grandeur’ as can be seen from the famous picture below, taken at the (second) Chançay meeting in 1937 (Weil to the left, Chevalley to the right and Weil’s sister Simonne standing).

Thanks to the Bourbaki archives we know that the meeting took place from september 16th to 28th, that each of them had to pay 16 francs for full pension and had to bring along their own sheets and towels.

But where exactly is this beautiful house? Jacques Borowczyk has written a nice paper Bourbaki et la touraine in which he describes the Bourbaki congresses of 1936 and 1937 at the Chevalley-house in Chançay and further those held in 1956, 1957 and 1959 in ‘hôtel de la Brèche’ in Amboise.

Borowczyk places the Chevalley house in the little hamlet of Chançay, called “La Massoterie”. The village files assert that in 1931 three people were living at La Massoterie : father Abel Chevalley, who took residence there after his retirement in 1931, his wife Marguerite and their son Claude. But, at the time of the Bourbaki congres in 1936, Marguerite remained the only permanent inhabitant. Sadly, Abel Chevalley, who together with Marguerite compiled the The concise Oxford French dictionary, died in 1934.

Usually when you know the name of the hamlet, of the village and add just to be certain ‘France’, Google Maps takes you there within metres. So, this was going to be a quick post, for a change… Well, much to my surprise, typing ‘La Massoterie, Chançay, France’ only produced the answer “We could not understand the location La Massoterie, Chançay, France”.

Did I spell it wrong? Or, did the name change over times? No, Googling for it the first hit gives you the map of a 10km walk around Chançay passing through la Massoterie!

Now what? Fortunately Borowczyk included in his paper an old map, from Napoleonic times, showing the exact location of La Massoterie (just above the flash-sign), facing the castle of Volmer. If you compare it with the picture below from present day Chançay (via Google earth) it is surprising how many of the landmarks have survived the changes over two centuries.

It is now easy to pinpoint the exact location and zoom into the Chavalley-house, and, you’re in for a small surprise : the place is called La Massotterie with 2 t’s…

Probably, Googles database is more reliable than the information provided by the village of Chançay, or the paper by Borowczyk as it is the same spelling as on the old Napoleonic map. Anyway, feel free to have a peek at Bourbaki’s Escorial yourself!

Quantum DiariesStrange goings on in Brazil

Around a week ago, I submitted the first paper to have me as the sole author. For someone working in such a large collaboration this is a pretty exciting moment, even if it is just proceedings :-)

Last September, I was given the incredible opportunity to attend one of the most prestigious conferences in the world of quark-related research. The Strangeness in Quark Matter conference, held every few years, gathers physicists from around the world to an exotic location to discuss our current understanding of the strange quark, and the unusual behavior of the particles it creates. In September last year it was held in Buzios, a tiny fishing village on the coast north of Rio de Janeiro.  I was invited to give a talk at the conference, and I was lucky enough to get funding for the trip as I was also giving a talk on diffraction the week before in Rio (See Strong couplings: Tales from Brazil).

photoSQM7

This was truly the most beautiful place I have ever seen (even compared to the stunning French snowy mountains I was falling down just a few weeks ago). It was also one of the strangest experiences of my life, and I am not attempting a pun. International conferences are a world unto themselves – indulgent in every sense. You feast frequently on a variety of delicious foods. You mingle with minds that are expertly extreme, taking various representations and interpretations of experimental analysis, sampling ideas and concepts from theorists from around the globe and across the field. Having never been to South America (or anywhere near as far as that) before in my life, the setting, for me, was entrancing and alien. Everywhere you looked there was a mango tree or a parasitic orchid hanging from a palm. Our buffets and breakfasts were adorned with Papaya and Guava. We were even treated to an exciting boat trip to a nearby island (nicknamed “ugly island”), and got to dive into the salty waters and snorkel!

photoSQM3

photoSQM6

Outside scheduled talk time we were constantly supplied with Caipirinas – cocktails with ice, sugar, lime and Cachaca (a spirit made from sugar-cane). In fact, after one long day, during a lively and late discussion that united the attendees with outstanding questions, drinks were brought round to encourage us to stay!

photoSQM5

The topics under discussion, (and to some extent, debate), were just as unusual. At the start of my PhD, I had only known my own limitations in understanding data, theoretical concepts or predictions. Before the conference, discussion with many theorists to help me to understand the expectations for the LHC only served to confuse and excite me more. However, as well as answering a lot of questions for me, this conference demonstrated the true nature of being at the very front end of science – right now, we know very little for certain. Ask any scientist about what the LHC and RHIC heavy ion experiments are all about, and they will very quickly start to tell you about exciting things such as the “Quark Gluon Plasma”, and evidence to suggest its properties, like “strangeness enhancement”. Try saying either one of these phrases too loudly at a conference like this, however, and expect some funny looks. The fact is, there isn’t much you can say without a little skepticism (or careful rewording) right now.

photoSQM4

One thing I know for sure is that my analysis area is not lacking in interest. Strange particle production in heavy ion collisions at RHIC, compared to pp collisions, can be explained quite powerfully by theory, but the phi resonance, which is not technically strange (made up of an s and anti-s quark) is somewhat more confusing. Asking what might happen to phi production in Pb-Pb collisions at the LHC is a tough enough question. However, begin to postulate what might occur in pp collisions with such high energy density that they become (in some ways) comparable to heavy ions, and you start to get some of those funny looks I mentioned. This was exactly what I did, and it sparked an argument between theorists of two extreme viewpoints, who eventually were asked to leave the room whilst the poor speaker continued. Of course, myself and another (very brilliant) ALICE physicist, Federico Antinori, who was keen to understand this issue, followed them out to take notes. :-)

photoSQM

The conference was full of moments like this, and I am sure many of them are. Unusual data presented by experimentalists struggling to interpret it, theorists arguing passionately about the consequences. I’d like to make a rather controversial statement that there is probably an equivalent to the “Phlogiston” phenomenon at work in much of front-line science. (If you don’t know what I am talking about, don’t just Wikipedia it, you should also watch “Chemistry: A Volatile History”, presented by Prof. Jim Al-Khalili on BBC4 Catch up TV, and hurry as you only have a few days left!) What I mean is, wherever we are dealing with the unknown, there are many contradicting ideas and some of them have to be nonsense. Unfortunately what seems like nonsense can be exactly what we are looking for. You only have to look at the history and evolution of science to see how these red herrings can take a long time to unveil, and how what looks like a ridiculous mistake (parity violation, for example!) could turn out to be a curiously perfect answer.

photoSQM2

Chad OrzelAmazing Laser Application 1: Light Show!

What's the application? The use of lasers to provide an entertaining light show for humans, dogs, or cats.

What problem(s) is it the solution to? 1) "How will I entertain my dog or cat?"

2) "How can we distract people from the fact that Roger Daltrey has no voice left?"

Read the rest of this post... | Read the comments on this post...

Chad OrzelLaser Smackdown: The Finalists

A couple of weeks ago, I announced a contest to determine the Most Amazing Laser Application. After a follow-up post listing the likely candidates, we have a final list of candidate applications, an even dozen of them (after consolidating some related topics):

  • Cat toy/ dog toy/ laser light show
  • Laser cooling/ BEC
  • Laser ranging/position measurement
  • Optical tweezers
  • Optical storage media (CD/DVD/Blu-Ray)
  • LIGO
  • Telecommunications
  • Holography
  • Laser ignited fusion
  • Laser eye surgery
  • Laser frequency comb/ spectroscopy
  • Laser guide stars/ adaptive optics

Here's how this will work: over the next week or so, I will write up a series of blog posts explaining these applications, and the pros and cons of each. At the end of that time, I'll put up a poll, and we'll decide the winner based on that most scientific of methods: random people on the Internet clicking radio buttons.

Watch this space-- the first application post will appear this afternoon.

Read the comments on this post...

Edinburgh Mathematical Physics GroupLast EMPJ before Easter

Just a quick post about our last EMPJ pre-seminar, to sum up our activities so far. First of all I’d like to thank Elena for giving us a great preseminar to Amihay Hanany’s talk. She gave us a very clear and interesting introduction to quiver diagrams and brane tilings, providing us with the essential information [...]

Chad OrzelMy Boskone Schedule

The usual "This is the stuff that looks interesting to me" post, based on the preliminary online program. Subject to change if they move things around, or if I discover something I overlooked that sounds more interesting, or if I decide I'm hungry, and opt to blow off panels in favor of food.

This year's program is lighter on panels, but includes both a signing and a reading. Which will be a very different experience than years past...

Read the rest of this post... | Read the comments on this post...

Chad OrzelWay Cuter Than the Puppy Bowl

There was some discombobulation yesterday afternoon that kept me from posting these-- I had meant them to be a Super Bowl alternative for the non-football-inclined. They'll work just as well as a Monday brightener, though. So here's a clip of SteelyKid a couple of weeks ago, laughing at the "got your food wrapper" game:

And here's one of her talking on the phone with her grandmother:

Read the rest of this post... | Read the comments on this post...

Chad OrzelLinks for 2010-02-08

  • "At least 60 percent of the people in Rockland who have gotten mumps during the current outbreak had not been fully immunized, Facelle said. Mumps were common before the vaccine became available. In 2008, there were only two reported cases in Rockland, according to the Department of Health's year-end communicable disease report."
  • "Playing Grandin in the HBO biopic Temple Grandin, Claire Danes captures the brilliance of the woman: how she sees things that others don't, and makes connections others can't. Danes gets Grandin's braying monotone, stooped posture and default defensive stance to other people--and more importantly she conveys it all unselfconsciously, as Grandin would, with no awareness of how she must look to others. (That is, until they start laughing or whispering behind her back.) The performance is more than just a collection of skillfully strung together tics. Danes also captures Grandin's sense of humor and her perception of everyday life: how she finds things funny that aren't necessarily jokes, and how unexpected sounds, lights and motion can put her in a mild state of panic."
  • "Periodisation in human history is an artifice. We the historians impose periods onto history in order to try to tame it and make it easier to handle and in doing so we run the very real risk of falsifying it. There are no sign posts rammed into the real roadmap of time saying you are now leaving the Early Middle Ages please conduct your self in future in a manner suitable for the High Middle Ages. In fact as the peasant farmer in Middle Europe turned over the page of his calendar from the 25th to the 26th of March in 1199 and thus entered the thirteenth century nothing changed in his life at all. Time is a constantly flowing river and change is incremental and on the ground mostly imperceptible as societies, cultures and ways of live evolve within the general flow. It is only with hindsight and selective interpretation of the facts that we can perceive the major changes that we then use to identify the periods that we stamp out of the riverbed."
  • "Most of us can't tell our secant from our cotangent. But the forms are everywhere, and Nikki Graziano wants to help us see them. Graziano, a math and photography student at Rochester Institute of Technology, overlays graphs and their corresponding equations onto her carefully composed photos. "I wanted to create something that could communicate how awesome math is, to everyone," she says. Graziano doesn't go out looking for a specific function but lets one find her instead. Once she's got an image she likes, Graziano whips up the numbers and tweaks the function until the graph it describes aligns perfectly with the photograph."
  • Star Wars vs. Titanic.
  • "As the middle linebacker, [Jonathan] Vilma is the quarterback of the defense. Watch him, and not Peyton Manning, for at least one drive during the Super Bowl and check out what kinds of furiously intense and split-second head games the two men are playing with each other. Maybe it looks uncomplicated, but you'd rather take a staple gun to your chode than replace either of these men for one play. They say there's only 11 minutes of actual "game" during a football game, but they're wrong. This tete-a-tete between quarterback and middle linebacker is the equivalent of watching a player's eyes during a chess match, if the pieces tried to kill each other, and their actions resulted in wanton crying and unnecessary financial ruin for some of the spectators. Enjoy."
Read the comments on this post...

Steinn SigurðssonPresidential Question Time


There was a truly weird advert or contributed op-ed on the radio a couple of days ago.

Some conservative anti-tax guy, and a left-wing editor had joined in calling for a US "Question Time", a la the UK "Question Time" in Parliament.
ie the President doing questions and answers with Congress, formally.

Inspired, in part, I suspect, by Obama's performance against the congressional republicans in a question and answer session, though conservative dude brought out the old teleprompter canard against Obama in explaining why he thought it was a good idea.

Read the rest of this post... | Read the comments on this post...

February 07, 2010

Steinn Sigurðssonacts of gods

Why do the gods keep interfering in football championships?
And which gods?

I'd think Loki, by default.
But Manning's throw away really smells of Óðinn - he always had a macabre sense of humour about bringing down established fighters through mischance or misjudgement.

'cause, y'know, when football players invoke the gods in their victory, you really have to think first about "which god?"

Read the comments on this post...

Steinn SigurðssonY rant on Mann hunt


The Yorkshire Ranter analyses the Climate Research Unity coverage in your friendly neighbourhood grauniad.

Robert P. points us to an analysis of the crack itself - kinda blah. Basically trawling through a misconfigured server by a script kiddie.

Read the comments on this post...

Chad OrzelThe Super Bowl Index of Economic and Cultural Indicators

It occurs to me that if you take the Super Bowl as a comment on the current state of the US of A-- which, you might as well, because it's as good as anything else-- we are totally screwed.

I mean, consider the fact that two-thirds of the ads were for Bud Light. OK, that may be a slight exaggeration, but I think every commercial break in the first half had at least one Bud Light ad in it. That basically tells you that the only company with the money to spend on Super Bowl advertising is one that makes its money from helping people drown their sorrows. That's an encouraging statement.

Worse yet, the general crop of ads continued the deplorable trend of glorifying idiots. This has been going on for years, but has really reached a peak lately with things like the Sonic ad campaign with two idiots in a car, those Coors Light commercials with the football coaches, and pretty much any commercial Taco Bell has made in the last, say, ten years. Maybe longer.

And worst of all, the Simpsons totally sold out. I mean, really, is nothing sacred?

What a bunch of crap. Space aliens looking at this year's sorry crop of ads would probably decide to save time and just nuke us from orbit. The orbit of Jupiter.

On the bright side, it was at least a decent game. Congratulations to the Saints, the feel-good story of the century so far.

Read the comments on this post...

Jordan EllenbergIrrational hatred and the Super Bowl


I had never seen Peyton Manning play football until the last five minutes of tonight’s Super Bowl.  But I always rooted against him.  Just didn’t like the guy, while not knowing anything about him.  I have the same sour feeling about some other athletes — Tiger Woods, Derek Jeter, Jim McMahon, Nancy Kerrigan, Michael Phelps — but these are all people I’ve seen play.

I found the last five minutes of the Super Bowl extremely satisfying, justifiably or not.

US/LHC BlogHow much data, how soon?

First off, we should mention here that CMS’s first paper from collision data has now been accepted for publication by the Journal of High Energy Physics. It’s a measurement of the angular distribution and momentum spectrum of charged particles produced in proton collisions at 0.9 and 2.36 TeV, using about 50,000 collision events recorded in December. It is really wonderful that this result could be turned around so quickly! The first of many papers to come, we hope.

Meanwhile, as already mentioned here, we now have the news of the run plan for the LHC. CERN is preparing for the longest continuous accelerator run of its history, 18 to 24 months. The inverse femtobarn of data to be recorded in that time is a lot, and will give us an opportunity to make many interesting measurements. Whether any of them will be evidence of new physics, I for one am not going to speculate! But if nothing else, this plan sets out what our LHC life for the next ~three years is going to look like.

But a shorter-term question comes to mind — 1 fb-1 over 18 to 24 months is one thing. But what about just the next few months? There is a major international conference coming up in July. What sort of LHC results might be ready by then? That will depend in part on how many collisions are delivered. I’ve seen various estimates for that, but they vary by an order of magnitude depending on the level of optimism, so I’d rather not guess. It will also depend on the experiments’ performance. How efficiently can we record those collisions? How quickly can we process them? How soon will we understand various parts of the detectors well enough to make quality measurements? How smart and clever can we be throughout the entire process? How much sleep is everyone going to get?

Ask me again in July. Meanwhile, game on.

Share/Bookmark

Mark Chu-CarrollThe End of Defining Chaos: Mixing it all together

The last major property of a chaotic system is topological mixing. You can think of mixing as being, in some sense, the opposite of the dense periodic orbits property. Intuitively, the dense orbits tell you that things that are arbitrarily close together for arbitrarily long periods of time can have vastly different behaviors. Mixing means that things that are arbitrarily far apart will eventually wind up looking nearly the same - if only for a little while.

Let's start with a formal definition.

As you can guess from the name, topological mixing is a property defined using topology. In topology, we generally define things in terms of open sets and neighborhoods. I don't want to go too deep into detail - but an open set captures the notion of a collection of points with a well-defined boundary that is not part of the set. So, for example, in a simple 2-dimensional euclidean space, the contents of a circle are one kind of open set; the boundary is the circle itself.

Now, imagine that you've got a dynamical system whose phase space is defined as a topological space. The system is defined by a recurrence relation: sn+1 = f(sn). Now, suppose that in this dynamical system, we can expand the state function so that it works as a continous map over sets. So if we have an open set of points A, then we can talk about the set of points that that open set will be mapped to by f. Speaking informally, we can say that if B=f(A), B is the space of points that could be mapped to by points in A.

The phase space is topologically mixing if, for any two open spaces A and B, there is some integer N such that fN(A) ∩ B &neq; 0. That is, no matter where you start, no matter how far away you are from some other point, eventually, you'll wind up arbitrarily close to that other point. (Note: I originally left out the quantification of N.)

Now, let's put that together with the other basic properties of a chaotic system. In informal terms, what it means is:

  1. Exactly where you start has a huge impact on where you'll end up.
  2. No matter how close together two points are, no matter how long their trajectories are close together, at any time, they can suddenly go in completely different directions.
  3. No matter how far apart two points are, no matter how long their trajectories stay far apart, eventually, they'll wind up in almost the same place.

All of this is a fancy and complicated way of saying that in a chaotic system, you never know what the heck is going to happen. No matter how long the system's behavior appears to be perfectly stable and predictable, there's absolutely no guarantee that the behavior is actually in a periodic orbit. It could, at any time, diverge into something totally unpredictable.

Anyway - I've spent more than enough time on the definition; I think I've pretty well driven this into the ground. But I hope that in doing so, I've gotten across the degree of unpredictability of a chaotic system. There's a reason that chaotic systems are considered to be a nightmare for numerical analysis of dynamical systems. It means that the most miniscule errors in any aspect of anything will produce drastic divergence.

So when you build a model of a chaotic system, you know that it's going to break down. No matter how careful you are, even if you had impossibly perfect measurements, just the nature of numerical computation - the limited precision and roundoff errors of numerical representations - mean that your model is going to break.

From here, I'm going to move from defining things to analyzing things. Chaotic systems are a nightmare for modeling. But there are ways of recognizing when a systems behavior is going to become chaotic. What I'm going to do next is look at how we can describe and analyze systems in order to recognize and predict when they'll become chaotic.

Read the comments on this post...

Quantum DiariesHeat to kill the pain

A sliver of sunlight on the next mountain, amindst clouds and snowfall.

A sliver of sunlight on the next mountain, amidst clouds and snowfall.

I’ve been a bit slow with blogging lately… And the reason is not a lack of things that are going on, far from that. Things got even more busy because of a long-planned week of skiing, and all the things I had to finish before then. Now, teaching is over for this semester, and since yesterday around noon, we are in a small mountain village in south western Austria.

Over the last few days there has been quite a bit of fresh snow, good for the slopes, but bad for visibility, especially since the clouds this morning were right at the altitude of the ski resort. After lunch, we saw a first sliver of sunlight, and the day ended with sunshine. The snow was great to ski on, but of course in the middle of a cloud I went into a little depression a bit to fast without seeing it, and jolted my back. But the sauna in our hotel hopefully helped to loosen the muscles again… Nothing like baking for a while to kill the pain of a long day of skiing.

I hope that over the next few days I’ll also have the time to write a bit about things that have been going on lately: A submitted paper, meeting in Paris, maybe more…. But no promises, skiing comes first!

The tool to soothe sore muscles: The sauna in our hotel in the ski resort.

The tool to soothe sore muscles: The sauna in our hotel in the ski resort.

Chad OrzelTalking to My Dog About Science: Why Public Communication of Science Matters, and How Weblogs Can Help

My talk at Maryland last Thursday went pretty well-- the impending Snowpocalypse kept the audience down, as people tried to fit in enough work to compensate for the Friday shutdown, but the people who were there seemed to like it, and asked good questions. If you weren't there, but want to know what I talked about, here are the slides on SlideShare:

This flattens out some of the more animation-dependent jokes, but gets you the basic idea. It is, of course, much more entertaining live, in case you're running an organization that might like a talk about this sort of thing...

Read the comments on this post...

Clifford JohnsonBad, but ever so Good

bakes_squidThe other day I had a moment of nostalgia and made some of what we called bakes when I was a child, growing up (for some years) in the Caribbean. Bakes are known as Johnny cakes in the US, as far as I understand, and used in much the same ways that we used them. This is certainly not something you should have every day, since they involve fat (vegetable shortening, or lard as we called it, although elsewhere the term is used for a kind of pig fat), flour, salt, and a pan half full of oil to deep fry it all in. Definitely sinful. I have very happy memories of having bakes with tasty oily fishy goodness of some sort. Salt fish (salt dried cod) would be a typical thing (bacalao as the Portuguese and [...]

Tommaso DorigoAnd CMS, In The Meantime...

Earlier today I reported about the publication of a paper by a non-professional physicist, Carl Brannen. Now I have to do the same for a paper -the first one in a long and groundbreaking series, you can bet- from the CMS collaboration, one of the two main experiments at the CERN Large Hadron Collider.


read more

Chad OrzelHow to Teach Physics to Your Dog: Obsessive Update

sm_cover_draft_atom.jpgMiscellaneous stories and links about How to Teach Physics to Your Dog:

  • Kathy Ceceri, who wrote the story about the book that ran in the Times Union, has posted the full article on the Home Physics blog. The link to the paper itself may very well disappear behind a paywall, but this post should remain accessible.
  • There's an article in the Chronicle of Higher Education that I can't read because I'm not a subscriber, and I don't remember the password needed to access it via the library subscription. If anybody has access and would like to tell me what it says, that would be cool. (UPDATE: I've got it now, thanks very much.)
  • How to Teach Physics to Your Dog is used as an example in a German presentation about problem solving. Google translate is good enough to get the idea of the way it's being used, but is no help at all with the embedded presentation slide. I think it's a translation of part of the Introduction, but my German is nonexistent.

That's the best of this week's vanity searching. Again, I will be on KSOO radio Tuesday evening, 6:30 pm ET, if you'd like to hear what I sound like live. I'll also be at Boskone next weekend, reading book-related stuff on Sunday morning.

Read the comments on this post...

Chad OrzelSports Science Poll: Super Bowl

We're mere hours away from the start of the Super Bowl, the biggest football game of the year. Obviously, the question of who will win has been the subject of much debate over the last couple of weeks on sports media and in offices around the country. What these discussions have lacked, though, is Science!!! (with any number of exclamation points).

So, let's employ science to determine the winner in advance, with a totally accurate Internet poll:

<a href="http://answers.polldaddy.com/poll/2662709/">Who will win the Super Bowl?</a><span style="font-size:9px;">(<a href="http://www.polldaddy.com">polls</a>)</span>

The game kicks off around 6:30pm ET, so make sure you vote before then, if you want your vote to have predictive power.

Read the comments on this post...

Chad OrzelLinks for 2010-02-07

  • "Just as expert physicists vary in their personal stances on interpretation in quantum mechanics, instructors vary on whether and how to teach interpretations of quantum phenomena in introductory modern physics courses. In this paper, we document variations in instructional approaches with respect to interpretation in two similar modern physics courses recently taught at the University of Colorado, and examine associated impacts on student perspectives regarding quantum physics. We find students are more likely to prefer realist interpretations of quantum-mechanical systems when instructors are less explicit in addressing student ontologies. We also observe contextual variations in student beliefs about quantum systems, indicating that instructors who choose to address questions of ontology in quantum mechanics should do so explicitly across a range of topics."
  • This ought to be supported by the Bertrand Russell Foundation, which funds all foundations that don't fund themselves (h/t Michael Nielsen).
  • "Searching for signatures of cosmic-scale archaeological artifacts such as Dyson spheres or Kardashev civilizations is an interesting alternative to conventional SETI. Uncovering such an artifact does not require the intentional transmission of a signal on the part of the original civilization. This type of search is called interstellar archaeology or sometimes cosmic archaeology. The detection of intelligence elsewhere in the Universe with interstellar archaeology or SETI would have broad implications for science. For example, the constraints of the anthropic principle would have to be loosened if a different type of intelligence was discovered elsewhere. A variety of interstellar archaeology signatures are discussed including non-natural planetary atmospheric constituents, stellar doping with isotopes of nuclear wastes, Dyson spheres, as well as signatures of stellar and galactic-scale engineering."
  • "Eureka is an affectionate paean to the small town, with a twist: it's population is made up of brilliant scientists (and their families), all of whom work at a vast, sooper sekrit lab called Global Dynamics that gets a large part of its funding from the Department of Defense, yet is dedicated to curiosity-driven research -- at least in principle. The show is a dramedy that combines elements of Northern Exposure and The X-Files, according to Jaime -- and I'd throw in a dash of Scrubs and Gilmore Girls to boot. In fact, it reminds me a little of Buffy the Vampire Slayer and Angel without the mystical trappings, both of which combined drama with humor and featured terrific characters and smart, sassy dialogue. (Needless to say, I'm a Eureka fan.) "It's small town trappings with endless possibility," he says, and admits the show's premise is at least partially inspired by places like Los Alamos, Berkeley Lab, Livermore, Bell Labs, even Area 51."
Read the comments on this post...

BackreactionBlack Holes and Information Loss

Here is - finally! - the continuation of my previous posts on Causal Diagrams and The Causal Diagram of the Black Hole. Due to popular demand, this time we will discuss the black hole information loss paradox. I previously wrote about this topic here, where I also listed the most common solution attempts. I am not going to repeat this list of solution attempts, so please refer to the older post for that. I want to focus here instead on the causal diagram.

Preliminaries

Last time, we finally arrived at the diagram of the evaporating black hole:

More precisely, it's a non-rotating uncharged black hole.

The most important features of this spacetime are that it has a (spacelike) singularity and an event horizon. The blue line indicates the surface of some collapsing matter configuration [1]. Let me remind you that since we've chosen radial coordinates, curves that pass through r=0 (where it is non-singular) look like they are reflected back. These segments of curves are also referred to as in- and outgoing in an obvious terminology.

Shown in the figure is v0, the last ray of light that passes through the collapsing matter and still manages to escape [2]. In the background depicted in the diagram, particle creation takes place at the horizon, which causes the black hole to lose mass. It then shrinks until it has finally completely evaporated, leaving behind nothing but thermal Hawking radiation [3].

Another important fact is that this spacetime is "asymptotically flat" or "asymptotically Minkowski," which means that at an infinite distance from the black hole spacetime is flat (flat as in "the curvature tensor vanishes"). This doesn't necessarily have to be the case (i.e. it could be asymptotically AdS instead), but it will make our discussion leaner. The reason for this asymptotic flatness is simply that in the beginning as well as in the end the matter is arbitrarily thinly dispersed.

To wrap up the summary, note that this diagram depicts a highly idealized situation. It's an evaporating black hole in an otherwise entirely empty spacetime. Realistic black holes are surrounded by matter and accrete mass, and occasionally Bob sends one of his Alices behind the horizon. But, as so often in physics, the uncluttered idealized version will help us understand the situation better without spoiling the conclusions.

Evolution

To understand the black hole information loss problem you need one further ingredient, that's what physicists mean with time-evolution. Intuitively, it means that one specifies a system at one moment in time, known as "initial conditions" and from this determines the status of that system at any other time by the help of a differential equation [4]. The most basic example is throwing a ball. The initial conditions needed are the location and velocity at one moment. The equation you use is Newton's law (or something equivalent).

In General Relativity the situation is more complicated but conceptually similar. You specify the initial conditions of your matter configuration at one moment in time and use Einstein's field equations to determine what space-time and matter are doing at any other time [5]. The attentive reader might remark that already in Special Relativity "one moment in time" is ambiguous. Indeed, and this is also the case in General Relativity. Point is, you can use any "moment in time" for you initial conditions, as long as it's at one moment, but everywhere in space (this is not the only option, but the most commonly used one). We call that a "complete spacelike hypersurface." Complete means basically it doesn't have holes and no expandable boundaries.

Almost there now. In the below picture I've added two complete spacelike hypersurfaces denoted Σ1 and Σ2


Information Loss

The evolution of a quantum mechanical state is unitary. That means in particular it is time-reversible [6]. You can evolve the status of your system back and forth how you like. There are many ways to think about information, and when talking about the black hole evolution some people like to hang themselves up on the exact meaning of information. That's a very interesting topic, but we'll cut this discussion short because it's irrelevant to understand the problem. Consider you have an initial state and you evolve it into a final state. If your final state does not uniquely specify the initial state we'll consider this loss of information. It means you can't tell what happened.

Black hole evaporation causes a loss of information because the outgoing radiation depends only on the total mass. Once the black hole is evaporated, all states with the same initial mass are converted into the same endstate. There are many ways a system can be composed if you only know the total mass [7]. There's only one way it will look after evaporation. This process is thus not reversible: it is not possible to reconstruct the initial state from the final state. But if it's not reversible, it can't be unitary. And for beginners that's the problem: The formation and complete evaporation of the black hole seems to be incompatible with quantum mechanics. On the advanced level it's more complicated since we know the computation leading to Hawking radiation breaks down when quantum gravity becomes important. In this case the problem is that this quantum gravitational contribution doesn't help you to get enough information out.

There are several points that people tend to misunderstand about the problem already on the beginner's level, so let me mention some pitfalls. First, note that the problem is not that the information is inaccessible behind a horizon. There is no horizon in the endstate, look at the diagram. It's flat Minkowski space with infinitely thinly dispersed thermal radiation. Think of the black hole as a black box. You start with flat Minkowski space, something happens in between, you end with flat Minkowski space. Yet, this evolution cannot be described by quantum mechanics as we know it. Second, to lay out the problem I didn't have to refer to measurement at all. It's a fundamental incompatibility in the evolution, you don't solve that incompatibility by waving your hands and yelling "measurement problem." Third, we are talking about the microscopic laws. Yes, on macroscopic scales we do have an arrow of time and entropy tends to increase anyway, but the problem is to accommodate the black hole evolution with the fundamentals of quantum mechanics prior to coarse graining. Fourth, yes, it is possible to cover the the Schwarzschild geometry by what is known as "nice slices," hypersurfaces that avoid the singularity for any finite time. (You find some very good graphics for that here, on slide 10). That doesn't solve the problem either because no matter how you turn it, your black hole evaporates away and you'll finally have to face that all you have left at scri minus is thermal radiation.

If you want to argue that the problem is a thought-experiment and unobservable, please read my earlier post on Thoughts and Experiments. We have to pay attention to inconsistencies even if they are not observable since they document a gap in our knowledge. While troubelsome, they also offer us opportunities to improve our understanding of Nature, which is why physicists turn problems like this upside-down and inside-out.

The value of the causal diagram once again is that it captures a lot of physics in one simple picture. If you look at it one more time you can see the problem. At the singularity matter gets crushed to infinite density and absent non-local effects everything that crossed the horizon has to fall into the singularity. Recall that curves on 45° angles depict the trajectories light travels on. You'd have to be faster than light to avoid the singularity once you've passed the horizon. All information about the initial state that evolves into the singularity is thus not available on the final slice. And that's exactly what happens in the calculation. You have to finally let go of the part of the initial wave-function that vanished behind the horizon, because it cannot avoid the singularity.

Now what

This then opens the playground for solutions to the problem. You either have to get the information out before it hits the singularity or avoid that it crosses the horizon at all. Lee and I argued in our last year's paper (see previous post for details) that the easiest way to avoid hitting the singularity is if there is no singularity. This by itself doesn't mean information behind the horizon becomes accessible again for the observer outside the horizon. But if you recall, this wasn't the problem to begin with. The problem was to achieve compatibility with unitary evolution, and this doesn't require information to be accessible to everybody as long as it exists.

In any case, since the black evaporation is and will likely remain elusive to experiment, everybody has their favorite solution. String theorists like the idea that information never gets lost because the evolution of the black hole is equivalently described by a dual, unitary, theory formulated on the boundary of the space-time which has been shown to encode regions of the bulk both inside and outside the horizon. People working on other approaches to quantum gravity seem to favor the idea that the singularity is avoided and the information somehow makes it out of the horizon, though at least to me it's remained unclear how so. (I sometimes suspect they'll finally reinvent and adopt the string theory solution.) Scenarios with stable or quasi-stable remnants that keep information or slowly release it also occasionally reoccur, and then there's parallel- and baby universes and a long list of miscellaneous other. The idea that black holes can't be formed to begin with lies in a shadowy fringe-area and is not considered plausible by the vast majority of researchers in the field.

I personally am somewhat agnostic on the how of information release, but am certain it can eventually only be achieved if the singularity is avoided (in the sense explained in mentioned paper.)

So. *wiping sweat off forehead* If you still haven't enough let me know.



[1] Modulo the question where it hits the singularity, see comments to previous post, but that's not relevant for our purposes.
[2] To be more precise, since we have assumed spherical symmetry to be able to draw a 4 dimensional manifold, a point in the figure is actually a sphere, but this distinction isn't so relevant. One can decompose the solutions to the wave-equation in spherical harmonics as usual. We are then talking here only about the s-wave state. States with higher angular momentum have a more complicated behavior.
[3] In the upheaval around the alleged risk of black holes at the LHC, some people ridiculed the fact that Hawking's calculation does not "automatically" decrease the mass of the black hole but that energy conservation is "put in by hand." That is in fact true. But that in this calculation the radiation does not "automatically" carry away the mass of the black hole is an artifact of doing the analysis in a fixed background, which "by hand" prohibits the mass from changing. There is absolutely nothing wrong with the argument that taking into account the energy loss through radiation the mass is not in fact constant. This in turn does not render the calculation false, it merely sets limits to its accuracy, and Hawking's calculation can be shown to be an excellent approximation as long as the ratio of mass loss is small. It is only in the end stage of evaporation when quantum gravity is important that the mass loss becomes relevant for the properties of the emitted radiation. This phase is thus still a matter of discussion.
[4] Note that it is entirely irrelevant the "initial" conditions are indeed the beginning of the evolution from which you determine the past. You could equally well specify the state of your system in the future and evolve it into the past.
[5] Note that this means once you've specified an equation of state for the matter, General Relativity does not allow you to specify what you want the matter to do over the course of time.
[6] The reverse is not true. A reversible evolution is in general not also unitary.
[7] Even if it's spherically symmetric. You lose all information in the radial direction.


Tommaso DorigoWhen Amateurs Get Published

This just in: Carl Brannen (here his blog) got a paper on gravitation published in a scientific magazine. Carl, who is the typical amateur who many "established scientists" in the blogosphere have labeled a crackpot in the last few years, does not actually fit the bill very well: he is a deep thinker who knows the literature of what he studies, and the fact that he is not salaried by a research institute means as little as this: he does it for Science, and not for a pay.


read more

John BaezThis Week's Finds in Mathematical Physics (Week 293)

John Baez

This week I want to list a bunch of recent papers and books on n-categories. Then I'll tell you about a conference on the math of environmental sustainability and green technology. And then I'll continue my story about electrical circuits. But first...

This column started with some vague dreams about n-categories and physics. Thanks to a lot of smart youngsters - and a few smart oldsters - these dreams are now well on their way to becoming reality. They don't need my help anymore! I need to find some new dreams. So, "week300" will be the last issue of This Week's Finds in Mathematical Physics.

I still like learning things by explaining them. When I start work at the Centre for Quantum Technologies this summer, I'll want to tell you about that. And I've realized that our little planet needs my help a lot more than the abstract structure of the universe does! The deep secrets of math and physics are endlessly engrossing - but they can wait, and other things can't. So, I'm trying to learn more about ecology, economics, and technology. And I'd like to talk more about those.

So, I plan to start a new column. Not completely new, just a bit different from this. I'll call it This Week's Finds, and drop the "in Mathematical Physics". That should be sufficiently vague that I can talk about whatever I want.

I'll make some changes in format, too. For example, I won't keep writing each issue in ASCII and putting it on the usenet newsgroups. Sorry, but that's too much work.

I also want to start a new blog, since the n-Category Cafe is not the optimal place for talking about things like the melting of Arctic ice. But I don't know what to call this new blog - or where it should reside. Any suggestions?

I may still talk about fancy math and physics now and then. Or even a lot. We'll see. But if you want to learn about n-categories, you don't need me. There's a lot to read these days. I mentioned Carlos Simpson's book in "week291" - that's one good place to start. Here's another introduction:

1) John Baez and Peter May, Towards Higher Categories, Springer, 2009. Also available at http://ncatlab.org/johnbaez/show/Towards+Higher+Categories

This has a bunch of papers in it, namely:

  • John Baez and Michael Shulman, Lectures on n-categories and cohomology.

  • Julia Bergner, A survey of (∞,1)-categories.

  • Simona Paoli, Internal categorical structures in homotopical algebra.

  • Stephen Lack, A 2-categories companion.

  • Lawrence Breen, Notes on 1- and 2-gerbes.

  • Ross Street, An Australian conspectus of higher categories.

After browsing these, you should probably start studying (∞,1)-categories, which are ∞-categories where all the n-morphisms for n > 1 are invertible. There are a few different approaches, but luckily they're nicely connected by some results described in Julia Bergner's paper. Two of the most important approaches are "Segal spaces" and "quasicategories". For the latter, start here:

2) Andre Joyal, The Theory of Quasicategories and Its Applications, http://www.crm.cat/HigherCategories/hc2.pdf

and then go here:

3) Jacob Lurie, Higher Topos Theory, Princeton U. Press, 2009. Also available at http://www.math.harvard.edu/~lurie/papers/highertopoi.pdf

This book is 925 pages long! Luckily, Lurie writes well. After setting up the machinery, he went on to use (∞,1)-categories to revolutionize algebraic geometry:

4) Jacob Lurie, Derived algebraic geometry I: stable infinity-categories, available as arXiv:math/0608228.
Derived algebraic geometry II: noncommutative algebra, available as arXiv:math/0702299.
Derived algebraic geometry III: commutative algebra, available as arXiv:math/0703204.
Derived algebraic geometry IV: deformation theory, available as arXiv:0709.3091.
Derived algebraic geometry V: structured spaces, available as arXiv:0905.0459.
Derived algebraic geometry VI: Ek algebras, available as arXiv:0911.0018.

For related work, try these:

5) David Ben-Zvi, John Francis and David Nadler, Integral transforms and Drinfeld centers in derived algebraic geometry available as arXiv:0805.0157.

6) David Ben-Zvi and David Nadler, The character theory of a complex group, available as arXiv:0904.1247.

Lurie is now using (∞,n)-categories to study topological quantum field theory. He's making precise and proving some old conjectures that James Dolan and I made:

7) Jacob Lurie, On the classification of topological field theories, available as arXiv:0905.0465.

Jonathan Woolf is doing it in a somewhat different way, which I hope will be unified with Lurie's work eventually:

8) Jonathan Woolf, Transversal homotopy theory, available as arXiv:0910.3322.

All this stuff is starting to transform math in amazing ways. And I hope physics, too - though so far, it's mainly helping us understand the physics we already have.

Meanwhile, I've been trying to figure out something else to do. Like a lot of academics who think about beautiful abstractions and soar happily from one conference to another, I'm always feeling a bit guilty, wondering what I could do to help "save the planet". Yes, we recycle and turn off the lights when we're not in the room. If we all do just a little bit... a little will get done. But surely mathematicians have the skills to do more!

But what?

I'm sure lots of you have had such thoughts. That's probably why Rachel Levy ran this conference last weekend:

9) Conference on the Mathematics of Environmental Sustainability and Green Technology, Harvey Mudd College, Claremont, California, Friday-Saturday, January 29-30, 2010. Organized by Rachel Levy.

Here's a quick brain dump of what I learned.

First, Harry Atwater of Caltech gave a talk on photovoltaic solar power:

10) Atwater Research Group, http://daedalus.caltech.edu/

The efficiency of silicon crystal solar cells peaked out at 24% in 2000. Fancy "multijunctions" get up to 40% and are still improving. But they use fancy materials like gallium arsenide, gallium indium phosphate, and so on. The world currently uses 13 terawatts of power. The US uses 3. But building just 1 terawatt of these fancy photovoltaics would use up more rare substances than we can get our hands on:

11) Gordon B. Haxel, James B. Hedrick, and Greta J. Orris, Rare earth elements - critical resources for high technology, US Geological Survey Fact Sheet 087-02, available at http://pubs.usgs.gov/fs/2002/fs087-02/

So, if we want solar power, we need to keep thinking about silicon and use as many tricks as possible to boost its efficiency.

There are some limits. In 1961, Shockley and Quiesser wrote a paper on the limiting efficiency of a solar cell. It's limited by thermodynamical reasons! Since anything that can absorb energy can also emit it, any solar cell also acts as a light-emitting diode, turning electric power back into light:

12) W. Shockley and H. J. Queisser, Detailed balance limit of efficiency of p-n junction solar cells, J. Appl. Phys. 32 (1961) 510-519.

13) Wikipedia, Schockley-Quiesser limit, http://en.wikipedia.org/wiki/Shockley%E2%80%93Queisser_limit

What are the tricks used to approach this theoretical efficiency? Multijunctions use layers of different materials to catch photons of different frequencies. The materials are expensive, so people use a lens to focus more sunlight on the photovoltaic cell. The same is true even for silicon - see the Umuwa Solar Power Station in Australia. But then the cells get hot and need to be cooled.

Roughening the surface of a solar cell promotes light trapping, by large factors! Light bounces around ergodically and has more chances to get absorbed and turned into useful power. There are theoretical limits on how well this trick works. But those limits were derived using ray optics, where we assume light moves in straight lines. So, we can beat those limits by leaving the regime where the ray-optics approximation holds good. In other words, make the surface complicated at length scales comparable to the wavelength at light.

For example: we can grow silicon wires from vapor! They can form densely packed structures that absorb more light:

14) B. M. Kayes, H. A. Atwater, and N. S. Lewis, Comparison of the device physics principles of planar and radial p-n junction nanorod solar cells, J. Appl. Phys. 97 (2005), 114302.

James R. Maiolo III, Brendan M. Kayes, Michael A. Filler, Morgan C. Putnam, Michael D. Kelzenberg, Harry A. Atwater and Nathan S. Lewis, High aspect ratio silicon wire array photoelectrochemical cells, J. Am. Chem. Soc. 129 (2007), 12346-12347.

Also, with such structures the charge carriers don't need to travel so far to get from the n-type material to the p-type material. This also boosts efficiency.

There are other tricks, still just under development. Using quasiparticles called "surface plasmons" we can adjust the dispersion relations to create materials with really low group velocity. Slow light has more time to get absorbed! We can also create "meta-materials" whose refractive index is really wacky - like n = -5!

I should explain this a bit, in case you don't understand. Remember, the refractive index of a substance is the inverse of the speed of light in that substance - in units where the speed of light in vacuum equals 1. When light passes from material 1 to material 2, it takes the path of least time - at least in the ray-optics approximation. Using this you can show Snell's law:

sin(θ1)/sin(θ2) = n2/n1

where ni is the index of refraction in the ith material and θi is the angle between the light's path and the line normal to the interface between materials:

Air has an index of refraction close to 1. Glass has an index of refraction greater than 1. So, when light passes from air to glass, it "straightens out": its path becomes closer to perpendicular to the air-glass interface. When light passes from glass to air, the reverse happens: the light bends more. But the sine of an angle can never exceed 1 - so sometimes Snell's law has no solution. Then the light gets stuck! More precisely, it's forced to bounce back into the glass. This is called "total internal reflection", and the easiest way to see it is not with glass, but water. Dive into a swimming pool and look up from below. You'll only see the sky in a limited disk. Outside that, you'll see total internal reflection.

Okay, that's stuff everyone learns in optics. But negative indices of refraction are much weirder! The light entering such a material will bend backwards.

Materials with a negative index of refraction also exhibit a reversed version of the ordinary Goos-Hänchen effect. In the ordinary version, light "slips" a little before reflecting during total internal reflection. The "slip" is actually a slight displacement of the light's wave crests from their expected location - a "phase slip". But for a material of negative refractive index, the light slips backwards. This allows for resonant states where light gets trapped in thin films. Maybe this can be used to make better solar cells.

Next, Kenneth Golden gave a talk on sea ice, which covers 7-10% of the ocean's surface and is a great detector of global warming. He's a mathematician at the University of Utah who also does measurements in the Arctic and Antarctic. If you want to go to math grad school without becoming a nerd - if you want to brave 70-foot swells, dig trenches in the snow and see emperor penguins - you want Golden as your advisor:

15) Ken Golden's website, http://www.math.utah.edu/~golden/

Salt gets incorporated into sea ice via millimeter-scale brine inclusions between ice platelets, forming a "dendritic platelet structure". Melting sea ice forms fresh water in melt ponds atop the ice, while the brine sinks down to form "bottom water" driving the global thermohaline conveyor belt. You've heard of the Gulf Stream, right? Well, that's just part of this story.

When it gets hotter, the Earth's poles get less white, so they absorb more light, making it hotter: this is "ice albedo feedback". Ice albedo feedback is largely controlled by melt ponds. So if you're interested in climate change, questions like the following become important: when do melt ponds get larger, and when do they drain out?

Sea ice is diminishing rapidly in the Arctic - much faster than all the existing climate models had predicted. In the Arctic, winter sea ice diminished in area by about 10% from 1978 to 2008. But summer sea ice diminished by about 40%! It took a huge plunge in 2007, leading to a big increase in solar heat input due to the ice albedo effect.


Time series of the percent difference in ice extent in March (the month of ice extent maximum) and September (the month of ice extent minimum) relative to the mean values for the period 1979-2000. Based on a least squares linear regression for the period 1979-2009, the rate of decrease for the March and September ice extents is -2.5% and -8.9% per decade, respectively. Figure from Perovich et al.

16) Donald K. Perovich, Jacqueline A. Richter-Menge, Kathleen F. Jones, and Bonnie Light, Sunlight, water, and ice: Extreme Arctic sea ice melt during the summer of 2007, Geophysical Research Letters, 35 (2008), L11501. Also available at http://www.crrel.usace.army.mil/sid/personnel/perovichweb/index1.htm

There's a lot less sea ice in the Antarctic than in the Arctic. Most of it is the Weddell Sea, and there it seems to be growing, maybe due to increased precipitation.

There's a lot of interesting math involved in understanding the dynamics of sea ice. The ice thickness distribution equation was worked out by Thorndike et al in 1975. The heat equation for ice and snow was worked out by Maykut and Understeiner in 1971. Sea ice dynamics was studied by Kibler.

Ice floes have two fractal regimes, one from 1 to 20 meters, another from 100 to 1500 meters. Brine channels have a fractal character well modeled by "diffusion limited aggregation". Brine starts flowing when there's about 5% of brine in the ice - a kind of percolation problem familiar in statistical mechanics. Here's what it looks like when there's 5.7% brine and the temperature is -8 °C:

17) Kenneth Golden, Brine inclusions in a crystal of lab-grown sea ice, http://www.math.utah.edu/~golden/7.html

Nobody knows why polycrystalline metals have a log-normal distribution of crystal sizes. Similar behavior, also unexplained, is seen in sea ice.

A "polynya" is an area of open water surrounded by sea ice. Polynyas occupy just .001% of the overall area in Antarctic sea ice, but create 1% of the icea. Icy cold katabatic winds blow off the mainland, pushing away ice and creating patches of open water which then refreeze.

There was anomalous export of sea ice through Fran Strait in the 1990s, which may have been one of the preconditions for high ice albedo feedback.

20-40% of sea ice is formed by surface flooding followed by refreezing. This was not included in the sea ice models that gave such inaccurate predictions.

The food chain is founded on diatoms. These form "extracellular polymeric substances"- goopy mucus-like stuff made of polysaccharides that protects them and serves as antifreeze. There's a lot of this stuff; the ice gets visibly stained by it.

For more, see:

18) Kenneth M. Golden, Climate change and the mathematics of transport in sea ice, AMS Notices, May 2009. Also available at http://www.ams.org/notices/200905/

19) Mathematics Awareness Month, April 2009: Mathematics and Climate, http://www.mathaware.org/mam/09/

Next, Julie Lundquist, who just moved from Lawrence Livermore Labs to the University of Colorado, spoke about wind power:

20) Julie Lunquist, Department of Atmospheric and Oceanic Sciences, University of Colorado, http://paos.colorado.edu/people/lundquist.php

With increased reliance on wind, the power grid will need to be redesigned to handle fluctuating power sources. In the US, currently, companies aren't paid for power they generate in excess of the amount they promised to make. So, accurate prediction is a hugely important game. Being off by 1% can cost millions of dollars! Europe has different laws, which encourage firms to maximize the amount of wind power they generate.

If you had your choice about where to build a wind turbine, you'd build it on the ocean or a very flat plain, where the air flows rather smoothly. Hilly terrain leads to annoying turbulence - but sometimes that's your only choice. Then you need to find the best spots, where the turbulence is least bad. Complete simulation of the Navier-Stokes equations is too computationally intensive, so people use fancier tricks. There's a lot of math and physics here.

For weather reports people use "mesoscale simulation" which cleverly treats smaller-scale features in an averaged way - but we need more fine-grained simulations to see how much wind a turbine will get. This is where "large eddy simulation" comes in. Eddy diffusivity is modeled by Monin-Obukhov similarity theory:

21) American Meteorological Society Glossary, Monin-Obukhov similarity theory, http://amsglossary.allenpress.com/glossary/search?id=monin-obukhov-similarity-theory1

A famous Brookhaven study suggested that the power spectrum of wind has peaks at 4 days, 1/2 day, and 1 minute. This perhaps justifies an approach where different time scales, and thus length scales, are treated separately and the results then combined somehow. The study is actually a bit controversial. But anyway, this is the approach people are taking, and it seems to work.

Night air is stable - but day air is often not, since the ground is hot, and hot air rises. So when a parcel of air moving along hits a hill, it can just shoot upwards, and not come back down! This means lots of turbulence.

The wind turbines at Altamont Pass in California kill more raptors than all other wind farms in the world combined! Old-fashioned wind turbines look like nice places to perch, spelling death to birds. Cracks in concrete attract rodents, which attract raptors, who get killed. The new ones are far better.

For more:

22) National Renewable Energy Laboratory, Research needs for winds resource characterization, available as http://www.nrel.gov/docs/fy08osti/43521.pdf

Finally, there was a talk by Ron Lloyd of Fat Spaniel Technologies. This is a company that makes software for solar plants and other sustainable energy companies:

23) Fat Spaniel Technologies, http://www.fatspaniel.com/products/

His talk was less technical so I didn't take detailed notes. One big point I took away was this: we need better tools for modelling! This is especially true with the coming of the "smart grid". In its simplest form, this is a power grid that uses lots of data - for example, data about power generation and consumption - to regulate itself and increase efficiency. Surely there will be a lot of math here. Maybe even the topic I've been talking about lately: bond graphs!

But now I want to talk about some very simple aspects of electrical circuits. Last week I listed various kinds of circuits. Now let's go into a bit more detail - starting with the simplest kind: circuits made of just wires and linear resistors, where the currents and voltages are independent of time.

Mathematically, such a circuit is a graph equipped with some extra data. First, each edge has a number associated to it - the "resistance". For example:

 o----1----o----3----o | | | | | | 2 3 2 | | | | | | o----3----o----1----o 
Second, we have current flowing through this circuit. To describe this, we first arbitrarily pick an orientation on each edge:
 o---->----o---->----o | | | | | | V V V | | | | | | o----<----o---->----o 
Then we label each edge with a number saying how much "current" is flowing through that edge, in the direction of the arrow:
 2 3 o---->----o---->----o | | | | | | 3 V V 1 V 3 | | | | | | o----<----o---->----o 2 -3 
Electrical engineers call the current I. Mathematically it's good to think of I as a "1-chain": a linear combination of oriented edges of our graph, with the coefficients of the linear combination being the numbers shown above.

If we know the current, we can work out a number for each vertex of our graph, saying how much current is flowing out of that vertex, minus how much is flowing in:

 2 5 o---->----o---->----o 0 | | | | | | V V V | | | | | | -5 o----<----o---->----o 0 -2 
Mathematically we can think of this as a "0-chain": a formal linear combination of the vertices of our graph, with the numbers shown above as coefficients. We call this 0-chain the "boundary" of the 1-chain we started with. Since our current was called I, we call its boundary δI.

Kirchhoff's current law says that

δI = 0

When this holds, let's say our circuit is a "closed". Physically this follows from the law of conservation of electrical charge, together with a reasonable assumption. Current is the flow of charge. If the total current flowing into a vertex wasn't equal to the amount flowing out, charge - positive or negative - would be building up there. But for a closed circuit, we assume it's not.

If a circuit is not closed, let's call it "open". These are interesting too. For example, we might have a circuit like this:

 x | | V | | o---->----o | | | | V V | | | | x x 
where we have current flowing in the wire on top and flowing out the two wires at bottom. We allow δI to be nonzero at the ends of these wires - the 3 vertices labelled x. This circuit is an "open system" in the sense of "week290", because it has these wires dangling out of it. It's not self-contained; we can use it as part of some bigger circuit. We should really formalize this more, but I won't now. Derek Wise did it more generally here:

24) Derek Wise, Lattice p-form electromagnetism and chain field theory, available as gr-qc/0510033.

The idea here was to get a category where chain complexes are morphisms. In our situation, composing morphisms amounts to gluing the output wires of one circuit into the input wires of another. This is an example of the general philosophy I'm trying to pursue, where open systems are treated as morphisms.

We've talked about 1-chains and 0-chains... but we can also back up and talk about 2-chains! Let's suppose our graph is connected - it is in our example - and let's fill it in with enough 2-dimensional "faces" to get something contractible. We can do this in a god-given way if our graph is drawn on the plane: just fill in all the holes!

 o---------o---------o |/////////|/////////| |/////////|/////////| |//FACE///|///FACE//| |/////////|/////////| |/////////|/////////| o---------o---------o 
In electrical engineering these faces are often called "meshes".

This give us a chain complex

 δ δ C0 <-------- C1 <-------- C2 
and a cochain complex:
 d d C0 --------> C1 ---------> C2 
As I've already said, it's good to think of the current I as a 1-chain, since then

δI = 0

is Kirchoff's current law. Since our little space is contractible the above equation implies that

I = δJ

for some 2-chain J called the "mesh current". This assigns to each face or "mesh" the current flowing around that face.

An electrical circuit also comes with a third piece of data, which I haven't mentioned yet. Each oriented edge should be labelled by a number called the "voltage" across that edge. Electrical engineers call the voltage V. It's good to think of V as a 1-cochain, which assigns to each edge the voltage across that edge.

Why a 1-cochain instead of a 1-chain? Because then

dV = 0

is the other basic law of electrical circuits - Kirchhoff's voltage law! This law says that the sum of these voltages around a mesh is zero. Since our little space is contractible the above equation implies that

V = dφ

for some 0-cochain φ called the "electrostatic potential". In electrostatics, this potential is a function on space. Here it assigns a number to each vertex of our graph.

Since the space of 1-cochains is the dual of the space of 1-chains, we can take the voltage V and the current I, glom them together, and get a number:

V(I)

This the "power": that is, the rate at which our network soaks up energy and dissipates it into heat. Note that this is just a fancy version of formula for power that I explained in "week290" - power is effort times flow.

I've given you three basic pieces of data labelling our circuit: the resistance R, the current I, and the voltage V. But these aren't independent! Ohm's law says that the voltage across any edge is the current through that times the resistance of that edge. But this remember: voltage is a 1-cochain while current is a 1-chain. So "resistance" can be thought of as a map from 1-cochains to 1-chains:

R: C1 → C1

This lets us write Ohm's law like this:

V = RI

This, in turn, means the power of our circuit is

V(I) = (RI)(I)

For physical reasons, this power is always nonnegative. In fact, let's assume it's positive unless I = 0. This is just another way of saying that resistance labelling each edge is positive. It can be very interesting to think about circuits with perfectly conducting wires. These would give edges whose resistance is zero. But that's a bit of an idealization, and right now I'd rather allow only positive resistances.

Why? Because then we can think of the above formula as the inner product of I with itself! In other words, then there's a unique inner product on 1-cochains with

(RI)(I) = <I,I>

In this situation

R: C1 → C1

is the usual isomorphism that we get between a finite-dimensional inner product space and its dual. (For this statement to be true, we'd better assume our graph has finitely many vertices and edges.)

Now, if you've studied de Rham cohomlogy, all this should start reminding you of Hodge theory. And indeed, it's a baby version of that! So, we're getting a little bit of Hodge theory, but in a setting where our chain complexes are really morphisms in a category. Or more generally, n-morphisms in an n-category!

There's a lot more to say, but that's enough for now. Here are some references on "electrical circuits as chain complexes":

25) Paul Bamberg and Shlomo Sternberg, A Course of Mathematics for Students of Physics, Cambridge University, Cambridge, 1982.

Bamberg and Sternberg is a great book overall for folks wanting to get started on mathematical physics. The stuff about circuits starts in chapter 12.

26) P. W. Gross and P. Robert Kotiuga, Electromagnetic Theory and Computation: A Topological Approach, Cambridge University Press, 2004.

This book says just a little about electrical circuits of the sort we're discussing, but it says a lot about chain complexes and electromagnetism. It's a great place to start if you know some electromagnetism but have never seen a chain complex.


Addenda: I thank Colin Backhurst, David Corfield, and Tim Silverman for corrections. I thank Garett Leskowitz for pointing out the material in Bamberg and Sternberg's book.

For more discussion, visit the n-Category Café.


So many young people are forced to specialize in one line or another that a young person can't afford to try and cover this waterfront - only an old fogy who can afford to make a fool of himself. If I don't, who will? - John Wheeler


© 2010 John Baez
baez@math.removethis.ucr.andthis.edu


-- Delivered by Feed43 service

Terence Tao254A, Notes 4: The semi-circular law


We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble {M_n = (\xi_{ij})_{1 \leq i,j \leq n}} of Hermitian matrices (thus {\xi_{ij} = \overline{\xi_{ji}}}; this includes real symmetric matrices as an important special case), in which the upper-triangular entries {\xi_{ij}}, {i>j} are iid complex random variables with mean zero and unit variance, and the diagonal entries {\xi_{ii}} are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).

In previous notes we saw that the operator norm of {M_n} was typically of size {O(\sqrt{n})}, so it is natural to work with the normalised matrix {\frac{1}{\sqrt{n}} M_n}. Accordingly, given any {n \times n} Hermitian matrix {M_n}, we can form the (normalised) empirical spectral distribution (or ESD for short)

\displaystyle  \mu_{\frac{1}{\sqrt{n}} M_n} := \frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j(M_n) / \sqrt{n}},

of {M_n}, where {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} are the (necessarily real) eigenvalues of {M_n}, counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of {M_n}.

When {M_n} is a random matrix ensemble, then the ESD {\mu_{\frac{1}{\sqrt{n}} M_n}} is now a random measure – i.e. a random variable taking values in the space {\hbox{Pr}({\mathbb R})} of probability measures on the real line. (Thus, the distribution of {\mu_{\frac{1}{\sqrt{n}} M_n}} is a probability measure on probability measures!)

Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles {M_n} as {n \rightarrow \infty}. Recall from Notes 0 that for any sequence of random variables in a {\sigma}-compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on {{\mathbb R}}, and to deterministic limits, we see that a sequence of random ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} converge in probability (resp. converge almost surely) to a deterministic limit {\mu \in \hbox{Pr}({\mathbb R})} (which, confusingly enough, is a deterministic probability measure!) if, for every test function {\varphi \in C_c({\mathbb R})}, the quantities {\int_{\mathbb R} \varphi\ d\mu_{\frac{1}{\sqrt{n}} M_n}} converge in probability (resp. converge almost surely) to {\int_{\mathbb R} \varphi\ d\mu}.

Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD {\mu_{\frac{1}{\sqrt{n}} M_n}}, one can form its expectation {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} \in \hbox{Pr}({\mathbb R})}, defined via duality (the Riesz representation theorem) as

\displaystyle  \int_{\mathbb R} \varphi\ d{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} := {\bf E} \int_{\mathbb R} \varphi\ d  \mu_{\frac{1}{\sqrt{n}} M_n};

this probability measure can be viewed as the law of a random eigenvalue {\frac{1}{\sqrt{n}}\lambda_i(M_n)} drawn from a random matrix {M_n} from the ensemble. We then say that the ESDs converge in expectation to a limit {\mu \in \hbox{Pr}({\mathbb R})} if {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}} converges the vague topology to {\mu}, thus

\displaystyle  {\bf E} \int_{\mathbb R} \varphi\ d  \mu_{\frac{1}{\sqrt{n}} M_n} \rightarrow \int_{\mathbb R} \varphi\ d\mu

for all {\phi \in C_c({\mathbb R})}.

In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.

Exercise 1 Let {M_n} be a sequence of {n \times n} Hermitian matrix ensembles, and let {\mu} be a continuous probability measure on {{\mathbb R}}.

  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to {\mu} if and only if {\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges almost surely to {\mu(-\infty,\lambda)} for all {\lambda > 0}.
  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges in probability to {\mu} if and only if {\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges in probability to {\mu(-\infty,\lambda)} for all {\lambda > 0}.
  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges in expectation to {\mu} if and only if {\mathop{\mathbb E} \mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges to {\mu(-\infty,\lambda)} for all {\lambda > 0}.

We can now state the Wigner semi-circular law.

Theorem 1 (Semicircular law) Let {M_n} be the top left {n \times n} minors of an infinite Wigner matrix {(\xi_{ij})_{i,j \geq 1}}. Then the ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution

\displaystyle  \mu_{sc} := \frac{1}{2\pi} (4-|x|^2)^{1/2}_+\ dx. \ \ \ \ \ (1)

A numerical example of this theorem in action can be seen at the MathWorld entry for this law.

The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices {\frac{1}{\sqrt{n}} M_n} almost surely has operator norm at most {2+o(1)}. Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval {[-2,2]} with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of {\frac{1}{\sqrt{n}} M_n} is almost surely at least {2-o(1)}, and thus (in the finite fourth moment case) the norm is in fact equal to {2+o(1)}. Indeed, we have just shown that the circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).

As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.

There are several ways to prove (or at least to heuristically justify) the circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).

— 1. Preliminary reductions —

Before we begin any of the proofs of the circular law, we make some simple observations which will reduce the difficulty of the arguments in the sequel.

The first observation is that the Cauchy interlacing law (Exercise 14 from Notes 3a) shows that the ESD of {\frac{1}{\sqrt{n}} M_n} is very stable in {n}. Indeed, we see from the interlacing law that

\displaystyle  \frac{n}{m} \mu_{\frac{1}{\sqrt{n}} M_n}( -\infty, \lambda / \sqrt{n}) \leq \mu_{\frac{1}{\sqrt{m}} M_m}( -\infty, \lambda / \sqrt{m})

\displaystyle \leq \frac{n}{m} \mu_{\frac{1}{\sqrt{n}} M_n}( -\infty, \lambda / \sqrt{n}) + \frac{n-m}{m}

for any threshold {\lambda} and any {n > m > 0}.

Exercise 2 Using this observation, show that to establish the circular law (in any of the three senses of convergence), it suffices to do so for a lacunary sequence {n_1, n_2, \ldots} of {n} (thus {n_{j+1}/n_j \geq c} for some {c>1} and all {j}).

The above lacunary reduction does not help one establish convergence in probability or expectation, but will be useful when establishing almost sure convergence, as it significantly reduces the inefficiency of the union bound. (Note that a similar lacunary reduction was also used to prove the strong law of large numbers in Notes 1.)

Next, we exploit the stability of the ESD with respect to perturbations, by taking advantage of the Weilandt-Hoffmann inequality

\displaystyle  \sum_{j=1}^n |\lambda_j(A+B)-\lambda_j(A)|^2 \leq \|B\|_F^2 \ \ \ \ \ (2)

for Hermitian matrices {A, B}, where {\|B\|_F := (\hbox{tr} B^2)^{1/2}} is the Frobenius norm of {B}. (This inequality was established in Exercise 6 or Exercise 11 of Notes 3a.) We convert this inequality into an inequality about ESDs:

Lemma 2 For any {n \times n} Hermitian matrices {A, B}, any {\lambda}, and any {\epsilon > 0}, we have

\displaystyle  \mu_{\frac{1}{\sqrt{n}}(A+B)}(-\infty, \lambda) \leq \mu_{\frac{1}{\sqrt{n}}(A)}(-\infty, \lambda+\epsilon) + \frac{1}{\epsilon^2 n^2} \|B\|_F^2

and similarly

\displaystyle  \mu_{\frac{1}{\sqrt{n}}(A+B)}(-\infty, \lambda) \geq \mu_{\frac{1}{\sqrt{n}}(A)}(-\infty, \lambda-\epsilon) - \frac{1}{\epsilon^2 n^2} \|B\|_F^2.

Proof: We just prove the first inequality, as the second is similar (and also follows from the first, by reversing the sign of {A, B}).

Let {\lambda_i(A+B)} be the largest eigenvalue of {A+B} less than {\lambda \sqrt{n}}, and let {\lambda_j(A)} be the largest eigenvalue of {A} less than {(\lambda+\epsilon) \sqrt{n}}. Our task is to show that

\displaystyle  i \leq j + \frac{1}{\epsilon^2 n^2} \|B\|_F^2.

If {i \leq j} then we are clearly done, so suppose that {i>j}. Then we have {|\lambda_l(A+B)-\lambda_l(A)| \geq \epsilon \sqrt{n}} for all {j < l \leq i}, and hence

\displaystyle  \sum_{j=1}^n |\lambda_j(A+B)-\lambda_j(A)|^2 \geq \epsilon^2 (j-i) n.

\Box

This has the following corollary:

Exercise 3 (Stability of ESD laws wrt small perturbations) Let {M_n} be a sequence of random Hermitian matrix ensembles such that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to a limit {\mu}. Let {N_n} be another sequence of random matrix ensembles such that {\frac{1}{n^2} \|N_n\|_F^2} converges almost surely to zero. Show that {\mu_{\frac{1}{\sqrt{n}}(M_n+N_n)}} converges almost surely to {\mu}.

Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.

Informally, this exercise allows us to discard any portion of the matrix which is {o(n^2)} in the Frobenius norm. For instance, the diagonal entries of {M_n} have a Frobenius norm of {O(n)} almost surely, by the strong law of large numbers. Hence, without loss of generality, we may set the diagonal equal to zero for the purposes of the semi-circular law.

One can also remove any component of {M_n} that is of rank {o(n)}:

Exercise 4 (Stability of ESD laws wrt small rank perturbations) Let {M_n} be a sequence of random Hermitian matrix ensembles such that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to a limit {\mu}. Let {N_n} be another sequence of random matrix ensembles such that {\frac{1}{n} \hbox{rank}(N_n)} converges almost surely to zero. Show that {\mu_{\frac{1}{\sqrt{n}}(M_n+N_n)}} converges almost surely to {\mu}. (Hint: use the Weyl inequalities instead of the Wielandt-Hoffman law.)

Show that the same claim holds if “almost surely” is replaced by “in probability” or “in expectation” throughout.

In a similar vein, we may apply the truncation argument (much as was done for the central limit theorem in Notes 2) to reduce the semi-circular law to the bounded case:

Exercise 5 Show that in order to prove the semi-circular law (in the almost sure sense), it suffices to do so under the additional hypothesis that the random variables are bounded. Similarly for the convergence in probability or in expectation senses.

Remark 2 These facts ultimately rely on the stability of eigenvalues with respect to perturbations. This stability is automatic in the Hermitian case, but for non-symmetric matrices, serious instabilities can occur due to the presence of pseudospectrum. We will discuss this phenomenon more in later lectures (but see also this earlier blog post).

— 2. The moment method —

We now prove the semi-circular law via the method of moments, which we have already used several times in the previous notes. In order to use this method, it is convenient to use the preceding reductions to assume that the coefficients are bounded, the diagonal vanishes, and that {n} ranges over a lacunary sequence. We will implicitly assume these hypotheses throughout the rest of the section.

As we have already discussed the moment method extensively, much of the argument here will be delegated to exercises. A full treatment of these computations can be found in the book of Bai and Silverstein.

The basic starting point is the observation that the moments of the ESD {\mu_{\frac{1}{\sqrt{n}} M_n}} can be written as normalised traces of powers of {M_n}:

\displaystyle  \int_{\mathbb R} x^k\ d\mu_{\frac{1}{\sqrt{n}} M_n}(x) = \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k. \ \ \ \ \ (3)

In particular, on taking expectations, we have

\displaystyle  \int_{\mathbb R} x^k\ d{\bf E}\mu_{\frac{1}{\sqrt{n}} M_n}(x) = {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k.

From concentration of measure (and the Bai-Yin theorem) for the operator norm of a random matrix (Proposition 7 of Notes 3), we see that the {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}} are uniformly subgaussian, indeed we have

\displaystyle  {\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}\{ |x| \geq \lambda \} \leq C e^{-c \lambda^2 n^2}

for {\lambda > C}, where {C, c} are absolute (so the decay in fact improves quite rapidly with {n}. From this and the moment continuity theorem (Theorem 4 of Notes 2), we can now establish the circular law through computing the mean and variance of moments:

Exercise 6

Ordinarily, computing second-moment quantities such as the left-hand side of (5) is harder than computing first-moment quantities such as (4). But one can obtain the required variance bounds from concentration of measure:

Exercise 7

  • When {k} is a positive even integer, Use Talagrand’s inequality and convexity of the Schatten norm {\|A\|_{S^k} = (\hbox{tr}(A^k))^{1/k}} to establish (6) (and hence (5)) when {k} is even.
  • For {k} odd, the formula {\|A\|_{S^k} = (\hbox{tr}(A^k))^{1/k}} still applies as long as {A} is positive definite. Applying this observation, the Bai-Yin theorem, and Talagrand’s inequality to the {S^k} norms of {\frac{1}{\sqrt{n}} M_n + c I_n} for a constant {c>2}, establish (6) (and hence (5)) when {k} is odd also.

Remark 3 More generally, concentration of measure results (such as Talagrand’s inequality) can often be used to automatically upgrade convergence in expectation to convergence in probability or almost sure convergence. We will not attempt to formalise this principle here.

It is not difficult to establish (6), (5) through the moment method as well. Indeed, recall from Theorem 10 of Notes 3 that we have the expected moment

\displaystyle  {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k = C_{k/2} + o_k(1) \ \ \ \ \ (7)

for all {k=1,2,\ldots}, where the Catalan number {C_{k/2}} is zero when {k} is odd, and is equal to

\displaystyle  C_{k/2} := \frac{k!}{(k/2+1)! (k/2)!} \ \ \ \ \ (8)

for {k} even.

Exercise 8 By modifying the proof of that theorem, show that

\displaystyle  {\bf E} |\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k|^2 = C_{k/2}^2 + o_k(1) \ \ \ \ \ (9)

and deduce (5). By refining the error analysis (e.g. using Theorem 12 of Notes 3, also establish (6).

In view of the above computations, the establishment of the semi-circular law now reduces to computing the moments of the semi-circular distribution:

Exercise 9 Show that for any {k=1,2,3,\ldots}, one has

\displaystyle  \int_{\mathbb R} x^k\ d\mu_{sc}(x) = C_{k/2}.

(Hint: use a trigonometric substitution {x = 2 \cos \theta}, and then express the integrand in terms of Fourier phases {e^{in\theta}}.)

This concludes the proof of the semi-circular law (for any of the three modes of convergence).

Remark 4 In the spirit of the Lindeberg exchange method, observe that Exercise (9) is unnecessary if one already knows that the semi-circular law holds for at least one ensemble of Wigner matrices (e.g. the GUE ensemble). Indeed, Exercise 9 can be deduced from such a piece of knowledge. In such a situation, it is not necessary to actually compute the main term {C_{k/2}} on the right of (4); it would be sufficient to know that that limit is universal, in that it does not depend on the underlying distribution. In fact, it would even suffice to establish the slightly weaker statement

\displaystyle  {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n)^k = {\bf E} \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M'_n)^k + o_k(1)

whenever {M_n, M'_n} are two ensembles of Wigner matrices arising from different underlying distributions (but still normalised to have mean zero, unit variance, and to be bounded (or at worst subgaussian)). We will take advantage of this perspective later in these notes.

— 3. The Stieltjes transform method —

The moment method was computationally intensive, but straightforward. As noted in Remark 4, even without doing much of the algebraic computation, it is clear that the moment method will show that some universal limit for Wigner matrices exists (or, at least, that the differences between the distributions of two different Wigner matrices converge to zero). But it is not easy to see from this method why the limit should be given by the semi-circular law, as opposed to some other distribution (although one could eventually work this out from an inverse moment computation).

When studying the central limit theorem, we were able to use the Fourier method to control the distribution of random matrices in a cleaner way than in the moment method. Analogues of this method exist, but require non-trivial formulae from noncommutative Fourier analysis, such as the Harish-Chandra integration formula (and also only work for highly symmetric ensembles, such as GUE or GOE), and will not be discussed in this course. (Our later notes on determinantal processes, however, will contain some algebraic identities related in some ways to the noncommutative Fourier-analytic approach.)

We now turn to another method, the Stieltjes transform method, which uses complex-analytic methods rather than Fourier-analytic methods, and has turned out to be one of the most powerful and accurate tools in dealing with the ESD of random Hermitian matrices. Whereas the moment method started from the identity (3), the Stieltjes transform method proceeds from the identity

\displaystyle  \int_{\mathbb R} \frac{1}{x-z}\ d\mu_{\frac{1}{\sqrt{n}} M_n}(x) = \frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M_n-zI)^{-1}

for any complex {z} not in the support of {\mu_{\frac{1}{\sqrt{n}} M_n}}. We refer to the expression on the left-hand side as the Stieltjes transform of {M_n} or of {\mu_{\frac{1}{\sqrt{n}} M_n}}, and denote it by {s_{\mu_{\frac{1}{n}} M_n}} or as {s_n} for short. The expression {(\frac{1}{\sqrt{n}} M_n-zI)^{-1}} is the normalised resolvent of {M_n}, and plays an important role in the spectral theory of that matrix. Indeed, in contrast to general-purpose methods such as the moment method, the Stieltjes transform method draws heavily on the specific linear-algebraic structure of this problem, and in particular on the rich structure of resolvents.

On the other hand, the Stieltjes transform can be viewed as a generating function of the moments via the Taylor series expansion

\displaystyle  s_n(z) = -\frac{1}{z} - \frac{1}{z^2} \frac{1}{n} \hbox{tr} M_n - \frac{1}{z^3} \frac{1}{n} \hbox{tr} M_n^2 - \ldots,

valid for {z} sufficiently large. This is somewhat (though not exactly) analogous to how the characteristic function {{\bf E} e^{itX}} of a scalar random variable can be viewed as a generating function of the moments {{\bf E} X^k}.

Now let us study the Stieltjes transform more systematically. Given any probability measure {\mu} on the real line, we can form its Stieltjes transform

\displaystyle  s_\mu(z) := \int_{\mathbb R} \frac{1}{x-z}\ d\mu(x)

for any {z} outside of the support of {\mu}; in particular, the Stieltjes transform is well-defined on the upper and lower half-planes in the complex plane. Even without any further hypotheses on {\mu} other than it is a probability measure, we can say a remarkable amount about how this transform behaves in {z}. Applying conjugations we obtain the symmetry

\displaystyle  \overline{s_\mu(z)} = s_\mu(\overline{z}) \ \ \ \ \ (10)

so we may as well restrict attention to {z} in the upper half-plane (say). Next, from the trivial bound

\displaystyle  |\frac{1}{x-z}| \leq \frac{1}{|\hbox{Im}(z)|}

one has the pointwise bound

\displaystyle  |s_\mu(z)| \leq \frac{1}{|\hbox{Im}(z)|}. \ \ \ \ \ (11)

In a similar spirit, an easy application of dominated convergence gives the asymptotic

\displaystyle  s_\mu(z) = \frac{1+o_\mu(1)}{z} \ \ \ \ \ (12)

where {o_\mu(1)} is an expression that, for any fixed {\mu}, goes to zero as {z} goes to infinity non-tangentially in the sense that {|\hbox{Re}(z)|/|\hbox{Im(z)}|} is kept bounded, where the rate of convergence is allowed to depend on {\mu}. From differentiation under the integral sign (or an application of Morera’s theorem and Fubini’s theorem) we see that {s_\mu(z)} is complex analytic on the upper and lower half-planes; in particular, it is smooth away from the real axis. From the Cauchy integral formula (or differentiation under the integral sign) we in fact get some bounds for higher derivatives of the Stieltjes transform away from this axis:

\displaystyle  |\frac{d^j}{dz^j} s_\mu(z)| = O_j( \frac{1}{|\hbox{Im}(z)|^{j+1}} ). \ \ \ \ \ (13)

Informally, {s_\mu} “behaves like a constant” at scales significantly less than the distance {|\hbox{Im}(z)|} to the real axis; all the really interesting action here is going on near that axis.

The imaginary part of the Stieltjes transform is particularly interesting. Writing {z = a+ib}, we observe that

\displaystyle  \hbox{Im}\frac{1}{x-z} = \frac{b}{(x-a)^2 + b^2} > 0

and so we see that

\displaystyle  \hbox{Im}( s_\mu(z) ) > 0

for {z} in the upper half-plane; thus {s_\mu} is a complex-analytic map from the upper half-plane to itself, a type of function known as a Herglotz function. (In fact, all complex-analytic maps from the upper half-plane to itself that obey the asymptotic (12) are of this form; this is a special case of the Herglotz representation theorem, which also gives a slightly more general description in the case when the asymptotic (12) is not assumed. A good reference for this material and its consequences is this book of Garnett.)

One can also express the imaginary part of the Stieltjes transform as a convolution

\displaystyle  \hbox{Im}( s_\mu(a+ib) ) = \pi \mu * P_b(a) \ \ \ \ \ (14)

where {P_b} is the Poisson kernel

\displaystyle  P_b(x) := \frac{1}{\pi} \frac{b}{x^2+b^2} = \frac{1}{b} P_1(\frac{x}{b}).

As is well known, these kernels form a family of approximations to the identity, and thus {\mu * P_b} converges in the vague topology to {\mu} (see e.g. my notes on distributions). Thus we see that

\displaystyle  \hbox{Im} s_\mu(\cdot+ib) \rightharpoonup \pi \mu

as {b \rightarrow 0^+} in the vague topology,or equivalently (by (10)) that

\displaystyle  \frac{s_\mu(\cdot+ib) - s_\mu(\cdot-ib)}{2\pi i} \rightharpoonup \mu \ \ \ \ \ (15)

as {b \rightarrow 0^+} (this is closely related to the Plemelj formula in potential theory). Thus we see that a probability measure {\mu} can be recovered in terms of the limiting behaviour of the Stieltjes transform on the real axis.

A variant of the above machinery gives us a criterion for convergence:

Exercise 10 (Stieltjes continuity theorem) Let {\mu_n} be a sequence of random probability measures on the real line, and let {\mu} be a deterministic probability measure.

  • {\mu_n} converges almost surely to {\mu} in the vague topology if and only if {s_{\mu_n}(z)} converges almost surely to {s_\mu(z)} for every {z} in the upper half-plane.
  • {\mu_n} converges in probability to {\mu} in the vague topology if and only if {s_{\mu_n}(z)} converges in probability to {s_\mu(z)} for every {z} in the upper half-plane.
  • {\mu_n} converges in expectation to {\mu} in the vague topology if and only if {{\bf E} s_{\mu_n}(z)} converges to {s_\mu(z)} for every {z} in the upper half-plane.

(Hint: The “only if” parts are fairly easy. For the “if” parts, take a test function {\phi \in C_c({\mathbb R})} and approximate {\int_{\mathbb R} \phi\ d\mu} by {\int_{\mathbb R} \phi*P_b\ d\mu = \frac{1}{\pi} \int_{\mathbb R} s_\mu(a+ib) \phi(a)\ da}. Then approximate this latter integral in turn by a Riemann sum, using (13).)

Thus, to prove the semi-circular law, it suffices to show that for each {z} in the upper half-plane, the Stieltjes transform

\displaystyle s_n(z) = s_{\mu_{\frac{1}{\sqrt{n}} M_n}}(z) = \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - zI )^{-1}

converges almost surely (and thus in probability and in expectation) to the Stieltjes transform {s_{\mu_{sc}}(z)} of the semi-circular law.

It is not difficult to compute the Stieltjes transform {s_{\mu_{sc}}} of the semi-circular law, but let us hold off on that task for now, because we want to illustrate how the Stieltjes transform method can be used to find the semi-circular law, even if one did not know this law in advance, by directly controlling {s_n(z)}. We will fix {z=a+ib} to be a complex number not on the real line, and allow all implied constants in the discussion below to depend on {a} and {b} (we will focus here only on the behaviour as {n \rightarrow \infty}).

The main idea here is predecessor comparison: to compare the transform {s_n(z)} of the {n \times n} matrix {M_n} with the transform {s_{n-1}(z)} of the top left {n-1 \times n-1} minor {M_{n-1}}, or of other minors. For instance, we have the Cauchy interlacing law (Exercise 14 from Notes 3a), which asserts that the eigenvalues {\lambda_1(M_{n-1}),\ldots,\lambda_{n-1}(M_{n-1})} of {M_{n-1}} intersperse that of {\lambda_1(M_n),\ldots,\lambda_n(M_n)}. This implies that for a complex number {a+ib} with {b>0}, the difference

\displaystyle  \sum_{j=1}^{n-1} \frac{b}{(\lambda_j(M_{n-1})/\sqrt{n}-a)^2 + b^2} - \sum_{j=1}^{n} \frac{b}{(\lambda_j(M_{n})/\sqrt{n}-a)^2 + b^2}

is an alternating sum of evaluations of the function {x \mapsto \frac{b}{(x-a)^2+b^2}}. The total variation of this function is {O( 1 )} (recall that we are suppressing dependence of constaants on {a,b}), and so the alternating sum above is {O(1)}. Writing this in terms of the Stieltjes transform, we conclude that

\displaystyle  \sqrt{n(n-1)} s_{n-1}( \frac{\sqrt{n}}{\sqrt{n-1}}(a+ib) ) - n s_n( a+ib ) = O(1).

Applying (13) to approximate {s_{n-1}( \frac{\sqrt{n}}{\sqrt{n-1}}(a+ib) )} by {s_{n-1}(a+ib)}, we conclude that

\displaystyle  s_n(a+ib) = s_{n-1}(a+ib) + O( \frac{1}{n} ). \ \ \ \ \ (16)

So for fixed {z=a+ib} away from the real axis, the Stieltjes transform {s_n(z)} is quite stable in {n}.

This stability has the following important consequence. Observe that while the left-hand side of (16) depends on the {n \times n} matrix {M_n}, the right-hand side depends only on the top left minor {M_{n-1}} of that matrix. In particular, it is independent of the {n^{th}} row and column of {M_n}. This implies that this entire row and column has only a limited amount of influence on the Stieltjes transform {s_n(a+ib)}: no matter what value one assigns to this row and column (including possibly unbounded values, as long as one keeps the matrix Hermitian of course), the transform {s_n(a+ib)} can only move by {O( \frac{|a|+|b|}{b^2 n} )}.

By permuting the rows and columns, we obtain that in fact any row or column of {M_n} can influence {s_n(a+ib)} is at most {O( \frac{1}{n} )}. (This is closely related to the observation in Exercise 4 that low rank perturbations do not significantly affect the ESD.) On the other hand, the rows of (the upper triangular portion of) {M_n} are jointly independent. When {M_n} is a Wigner random matrix, we can then apply a standard concentration of measure result, such as McDiarmid’s inequality (Theorem 7 from Notes 1) to conclude concetration of {s_n} around its mean:

\displaystyle  {\bf P}( |s_n(a+ib) - {\Bbb E} s_n(a+ib)| \geq \lambda/\sqrt{n} ) \leq C e^{-c\lambda^2} \ \ \ \ \ (17)

for all {\lambda > 0} and some absolute constants {C, c > 0}. (This is not necessarily the strongest concentration result one can establish for the Stieltjes transform, but it will certainly suffice for our discussion here.) In particular, we see from the Borel-Cantelli lemma (Exercise 24 of Notes 0a)that for any fixed {z} away from the real line, {s_n(z) - {\bf E} s_n(z)} converges almost surely (and thus also in probability) to zero. As a consequence, convergence of {s_n(z)} in expectation automatically implies convergence in probability or almost sure convergence.

However, while concentration of measure tells us that {s_n(z)} is close to its mean, it does not shed much light as to what this mean is. For this, we have to go beyond the Cauchy interlacing formula and deal with the resolvent {(\frac{1}{\sqrt{n}} M_n - z I_n)^{-1}} more directly. Firstly, we observe from the linearity of trace that

\displaystyle  {\bf E} s_n(z) = \frac{1}{n} \sum_{j=1}^n {\bf E} [ (\frac{1}{\sqrt{n}} M_n - z I_n)^{-1} ]_{jj}

where {[A]_{jj}} denotes the {jj} component of a matrix {A}. Because {M_n} is a Wigner matrix, it is easy to see on permuting the rows and columns that all of the random variables {[ (\frac{1}{\sqrt{n}} M_n - z I_n)^{-1} ]_{jj}} have the same distribution. Thus we may simplify the above formula as

\displaystyle  {\bf E} s_n(z) = {\bf E} [ (\frac{1}{\sqrt{n}} M_n - z I_n)^{-1} ]_{nn}. \ \ \ \ \ (18)

So now we have to compute the last entry of an inverse of a matrix. There are of course a number of formulae for this, such as Cramer’s rule. But it will be more convenient here to use a formula based instead on the Schur complement:

Exercise 11 Let {A_n} be a {n \times n} matrix, let {A_{n-1}} be the top left {n-1 \times n-1} minor, let {a_{nn}} be the bottom right entry of {A_n}, let {X \in {\mathbb C}^{n-1}} be the right column of {A_n} with the bottom right entry removed, and let {(X')^* \in ({\mathbb C}^{n-1})^*} be the bottom row with the bottom right entry removed. In other words,

\displaystyle  A_n = \begin{pmatrix} A_{n-1} & X \\ (X')^* & a_{nn} \end{pmatrix}.

Assume that {A_n} and {A_{n-1}} are both invertible. Show that

\displaystyle  [A_n^{-1}]_{nn} = \frac{1}{a_{nn} - (X')^* A_{n-1}^{-1} X}.

(Hint: Solve the equation {A_n v = e_n}, where {e_n} is the {n^{th}} basis vector, using the method of Schur complements (or from first principles).)

The point of this identity is that it describes (part of) the inverse of {A_n} in terms of the inverse of {A_{n-1}}, which will eventually provide a non-trivial recursive relationship between {s_n(z)} and {s_{n-1}(z)}, which can then be played off against (16) to solve for {s_n(z)} in the asymptotic limit {n \rightarrow \infty}.

In our situation, the matrix {\frac{1}{\sqrt{n}} M_n - z I_n} and its minor {\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1}} is automatically invertible. Inserting the above formula into (18) (and recalling that we normalised the diagonal of {M_n} to vanish), we conclude that

\displaystyle  {\bf E} s_n(z) = - {\bf E} \frac{1}{z + \frac{1}{n} X^* (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1} X }, \ \ \ \ \ (19)

where {X \in {\mathbb C}^{n-1}} is the top right column of {M_n} with the bottom entry {\xi_{nn}} removed.

One may be concerned that the denominator here could vanish. However, observe that {z} has imaginary part {b} if {z=a+ib}. Furthermore, from the spectral theorem we see that the imaginary part of {(\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1}} is positive definite, and so {X^* (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1} X} has non-negative imaginary part. As a consequence the magnitude of the denominator here is bounded below by {|b|}, and so its reciprocal is {O(1)} (compare with (11)). So the reciprocal here is not going to cause any discontinuity, as we are considering {b} is fixed and non-zero.

Now we need to understand the expression {X^* (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1} X}. We write this as {X^* R X}, where {R} is the resolvent matrix {R := (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1}}. The distribution of the random matrix {R} could conceivably be quite complicated. However, the key point is that the vector {X} only involves the entries of {M_n} that do not lie in {M_{n-1}}, and so the random matrix {R} and the vector {X} are independent. Because of this, we can use the randomness of {X} to do most of the work in understanding the expression {X^* R X}, without having to know much about {R} at all.

To understand this, let us first condition {R} to be a deterministic matrix {R = (r_{ij})_{1 \leq i,j \leq n-1}}, and see what we can do with the expression {X^* R X}.

Firstly, observe that {R} will not be arbitrary; indeed, from the spectral theorem we see that {R} will have operator norm at most {O(1)}. Meanwhile, from the Chernoff (or Hoeffding) inequality (Theorem 2 or Exercise 4 of Notes 1) we know that {X} has magnitude {O( \sqrt{n} )} with overwhelming probability. So we know that {X^* R X} has magnitude {O( n )} with overwhelming probability.

Furthermore, we can use concentration of measure as follows. Given any positive semi-definite matrix {A} of operator norm {O(1)}, the expression {(X^* A X)^{1/2} = \| A^{1/2} X \|} is a Lipschitz function of {X} with operator norm {O(1)}. Applying Talagrand’s inequality (Theorem 9 of Notes 1) we see that this expression concentrates around its median:

\displaystyle  {\bf P}( |(X^* A X)^{1/2} - {\bf M} (X^* A X)^{1/2}| \geq \lambda ) \leq C e^{-c\lambda^2}

for any {\lambda > 0}. On the other hand, {\|A^{1/2} X\| = O( \|X\| )} has magnitude {O(\sqrt{n})} with overwhelming probability, so the median {{\bf M} (X^* A X)^{1/2}} must be {O(\sqrt{n})}. Squaring, we conclude that

\displaystyle  {\bf P}( |X^* A X - {\bf M} X^* A X| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2}

(possibly after adjusting the absolute constants {C, c}). As usual, we may replace the median with the expectation:

\displaystyle  {\bf P}( |X^* A X - {\bf E} X^* A X| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2}

This was for positive-definite matrices, but one can easily use the triangle inequality to generalise to self-adjoint matrices, and then to arbitrary matrices, of operator norm {1}, and conclude that

\displaystyle  {\bf P}( |X^* R X - {\bf E} X^* R X| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2} \ \ \ \ \ (20)

for any deterministic matrix {R} of operator norm {O(1)}.

But what is the expectation {{\bf E} X^* R X}? This can be expressed in components as

\displaystyle {\bf E} X^* R X = \sum_{i=1}^{n-1} \sum_{j=1}^{n-1} {\bf E} \overline{\xi_{in}} r_{ij} \xi_{jn}

where {\xi_{in}} are the entries of {X}, and {r_{ij}} are the entries of {R}. But the {\xi_{in}} are iid with mean zero and variance one, so the standard second moment computation shows that this expectation is nothing more than the trace

\displaystyle  \hbox{tr}(R) = \sum_{i=1}^{n-1} r_{ii}

of {R}. We have thus shown the concentration of measure result

\displaystyle  {\bf P}( |X^* R X - \hbox{tr}(R)| \geq \lambda \sqrt{n} ) \leq C e^{-c\lambda^2} \ \ \ \ \ (21)

for any deterministic matrix {R} of operator norm {O(1)}, and any {\lambda > 0}. Informally, {X^* R X} is typically {\hbox{tr}(R) +O(\sqrt{n})}.

The bound (21) was proven for deterministic matrices, but by using conditional expectation it also applies for any random matrix {R}, so long as that matrix is independent of {X}. In particular, we may apply it to our specific matrix of interest

\displaystyle R := (\frac{1}{\sqrt{n}} M_{n-1} - z I_{n-1})^{-1}.

The trace of this matrix is essentially just the Stieltjes transform {s_{n-1}(z)} at {z}. Actually, due to the normalisation factor being slightly off, we actually have

\displaystyle  \hbox{tr}(R) = n \frac{\sqrt{n}}{\sqrt{n-1}} s_{n-1}( \frac{\sqrt{n}}{\sqrt{n-1}} z ),

but by using the smoothness (13) of the Stieltjes transform, together with the stability property (16) we can simplify this as

\displaystyle  \hbox{tr}(R) = n ( s_n(z) + o(1) ).

In particular, from (21) and (17), we see that

\displaystyle  X^* R X = n ( {\bf E} s_n(z) + o(1) )

with overwhelming probability. Putting this back into (19), and recalling that the denominator is bounded away from zero, we have the remarkable equation

\displaystyle  {\bf E} s_n(z) = - \frac{1}{z + {\bf E} s_n(z)} + o(1). \ \ \ \ \ (22)

Note how this equation came by playing off two ways in which the spectral properties of a matrix {M_n} interacted with that of its minor {M_{n-1}}; firstly via the Cauchy interlacing inequality, and secondly via the Schur complement formula.

This equation already describes the behaviour of {{\bf E} s_n(z)} quite well, but we will content ourselves with understanding the limiting behaviour as {n \rightarrow \infty}. From (13) and Fubini’s theorem we know that the function {{\bf E} s_n} is locally uniformly equicontinuous and locally uniformly bounded away from the real line. Applying the Arzelá-Ascoli theorem, we thus conclude that on a subsequence at least, {{\bf E} s_n} converges locally uniformly to a limit {s}. This will be a Herglotz function (i.e. an analytic function mapping the upper half-plane to the upper half-plane), and taking limits in (22) (observing that the imaginary part of the denominator here is bounded away from zero) we end up with the exact equation

\displaystyle  s(z) = -\frac{1}{z+s(z)}. \ \ \ \ \ (23)

We can of course solve this by the quadratic formula, obtaining

\displaystyle  s(z) = - \frac{z \pm \sqrt{z^2-4}}{2} = \frac{2}{z \pm \sqrt{z^2-4}}.

To figure out what branch of the square root one has to use here, we use (12), which easily implies that

\displaystyle  s(z) = \frac{1+o(1)}{z}

as {z} goes to infinity non-tangentially away from the real line. (To justify this, one has to make the error term in (12) uniform in {n}, but this can be accomplished without difficulty using the Bai-Yin theorem (for instance).) Also, we know that {s} has to be complex analytic (and in particular, continuous) away from the real line. From this and basic complex analysis, we conclude that

\displaystyle  s(z) = \frac{-z + \sqrt{z^2-4}}{2} \ \ \ \ \ (24)

where {\sqrt{z^2-4}} is the branch of the square root with a branch cut at {[-2,2]} and which equals {z} at infinity.

As there is only one possible subsequence limit of the {{\bf E} s_n}, we conclude that {{\bf E} s_n} converges locally uniformly (and thus pointwise) to the function (24), and thus (by the concentration of measure of {s_n(z)}) we see that for each {z}, {s_n(z)} converges almost surely (and in probability) to {s(z)}.

Exercise 12 Find a direct proof (starting from (22), (12), and the smoothness of {{\bf E} s_n(z)}) that {{\bf E} s_n(z) = s(z) + o(1)} for any fixed {z}, that avoids using the Arzelá-Ascoli theorem. (The basic point here is that one has to solve the approximate equation (22), using some robust version of the quadratic formula. The fact that {{\bf E} s_n} is a Herglotz function will help eliminate various unwanted possibilities, such as one coming from the wrong branch of the square root.)

To finish computing the limiting ESD of Wigner matrices, we have to figure out what probability measure {s} comes from. But this is easily read off from (24) and (15):

\displaystyle  \frac{s(\cdot+ib) - s(\cdot-ib)}{2\pi i} \rightharpoonup \frac{1}{2\pi} (4-x^2)^{1/2}_+\ dx = \mu_{sc} \ \ \ \ \ (25)

as {b \rightarrow 0}. Thus the semi-circular law is the only possible measure which has Stieltjes transform {s}, and indeed a simple application of the Cauchy integral formula and (25) shows us that {s} is indeed the Stieltjes transform of {\mu_{sc}}.

Putting all this together, we have completed the Stieltjes transform proof of the semi-circular law.

Remark 5 In order to simplify the above exposition, we opted for a qualitative analysis of the semi-circular law here, ignoring such questions as the rate of convergence to this law. However, an inspection of the above arguments reveals that it is easy to make all of the above analysis quite quantitative, with quite reasonable control on all terms. (One has to use Exercise 12 instead of the Arzelá-Ascoli theorem if one wants everything to be quantitative.) In particular, it is not hard to use the above analysis to show that for {|\hbox{Im}(z)| \geq n^{-c}} for some small absolute constant {c>0}, one has {s_n(z) = s(z) + O(n^{-c})} with overwhelming probability. Combining this with a suitably quantitative version of the Stieltjes continuity theorem, this in turn gives a polynomial rate of convergence of the ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} to the semi-circular law {\mu_{sc}}, in that one has

\displaystyle  \mu_{\frac{1}{\sqrt{n}} M_n}( -\infty, \lambda ) = \mu_{sc}(-\infty,\lambda) + O(n^{-c})

with overwhelming probability for all {\lambda \in {\mathbb R}}.

A variant of this quantitative analysis can in fact get very good control on this ESD down to quite fine scales, namely to scales {\frac{\log^{O(1)} n}{n}}, which is only just a little bit larger than the mean spacing {O(1/n)} of the normalised eigenvalues (recall that we have {n} normalised eigenvalues, constrained to lie in the interval {[-2-o(1), 2+o(1)]} by the Bai-Yin theorem). This was accomplished by Erdös, Schlein, and Yau (under some additional regularity hypotheses on the distribution {\xi}, but these can be easily removed with the assistance of Talagrand’s inequality) by using an additional observation, namely that the eigenvectors of a random matrix are very likely to be delocalised in the sense that their {\ell^2} energy is dispersed more or less evenly across its coefficients. We will return to this point in later notes.

— 4. Dyson Brownian motion and the Stieltjes transform —

We now explore how the Stieltjes transform interacts with the Dyson Brownian motion introduced in Notes 3b. We let {n} be a large number, and let {M_{n}(t)} be a Wiener process of Hermitian random matrices, with associated eigenvalues {\lambda_{1}(t),\ldots,\lambda_{n}(t)}, Stieltjes transforms

\displaystyle  s(t,z) := \frac{1}{n} \sum_{j=1}^n \frac{1}{\lambda_{j}(t)/\sqrt{n} - z} \ \ \ \ \ (26)

and spectral measures

\displaystyle  \mu(t,z) := \frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j(t)/\sqrt{n}}. \ \ \ \ \ (27)

We now study how {s}, {\mu} evolve in time in the asymptotic limit {n \rightarrow \infty}. Our computation will be only heuristic in nature.

Recall from Notes 3b that the eigenvalues {\lambda_i = \lambda_i(t)} undergo Dyson Brownian motion

\displaystyle  d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}. \ \ \ \ \ (28)

Applying (26) and Taylor expansion (dropping all terms of higher order than {dt}, using the Ito heuristic {dB_i = O(dt^{1/2})}), we conclude that

\displaystyle  ds = - \frac{1}{n^{3/2}} \sum_{i=1}^n \frac{dB_i}{(\lambda_i/\sqrt{n}-z)^2} - \frac{1}{2n^2} \sum_{i=1}^n \frac{|dB_i|^2}{(\lambda_i/\sqrt{n}-z)^3}

\displaystyle - \frac{1}{n^{3/2}} \sum_{1 \leq i,j \leq n: i \neq j}\frac{dt}{(\lambda_i - \lambda_j) (\lambda_j/\sqrt{n}-z)^2}.

For {z} away from the real line, the term {\frac{1}{2n^2} \sum_{i=1}^n \frac{|dB_i|^2}{(\lambda_i/\sqrt{n}-z)^3}} is of size {O( dt / n )} and can heuristically be ignored in the limit {n \rightarrow \infty}. Dropping this term, and then taking expectations to remove the Brownian motion term {dB_i}, we are led to

\displaystyle  {\bf E} ds = - {\bf E} \frac{1}{n^{3/2}} \sum_{1 \leq i,j \leq n: i \neq j}\frac{dt}{(\lambda_i - \lambda_j) (\lambda_j/\sqrt{n}-z)^2}.

Performing the {i} summation using (26) we obtain

\displaystyle  {\bf E} ds = - {\bf E} \frac{1}{n} \sum_{1 \leq j \leq n} \frac{s(\lambda_j) dt}{(\lambda_j/\sqrt{n}-z)^2}

where we adopt the convention that for real {x}, {s(x)} is the average of {s(x+i0)} and {s(x-i0)}. Using (27), this becomes

\displaystyle  {\bf E} s_t = - {\bf E} \int_{\mathbb R} \frac{s(x)}{(x-z)^2}\ d\mu(x) \ \ \ \ \ (29)

where the {t} subscript denotes differentiation in {t}. From (15) we heuristically have

\displaystyle  s(x \pm i0) = s(x) \pm \pi i \mu(x)

(heuristically treating {\mu} as a function rather than a measure) and on squaring one obtains

\displaystyle  s(x \pm i0)^2 = (s(x)^2 - \pi^2 \mu^2(x)) \pm 2 \pi i s(x) \mu(x).

From this the Cauchy integral formula around a slit in real axis (using the bound (11) to ignore the contributions near infinity) we thus have

\displaystyle  s^2(z) = \int_{\mathbb R} \frac{2s(x)}{x-z}\ d\mu(x)

and thus on differentiation in {z}

\displaystyle  2 s s_z(z) = \int_{\mathbb R} \frac{2s(x)}{(x-z)^2}\ d\mu(x).

Comparing this with (29), we obtain

\displaystyle  {\bf E} s_t + {\bf E} s s_z = 0.

From concentration of measure, we expect {s} to concentrate around its mean {\overline{s} := {\bf E} s}, and similarly {s_z} should concentrate around {\overline{s}_z}. In the limit {n \rightarrow \infty}, the expected Stieltjes transform {\overline{s}} should thus obey Burgers’ equation

\displaystyle  s_t + s s_z = 0. \ \ \ \ \ (30)

To illustrate how this equation works in practice, let us give an informal derivation of the semi-circular law. We consider the case when the Wiener process starts from {M(0) = 0}, thus {M_t \equiv \sqrt{t} G} for a GUE matrix {G}. As such, we have the scaling symmetry

\displaystyle s(t,z) = \frac{1}{\sqrt{t}} s_{GUE}(\frac{z}{\sqrt{t}})

where {s_{GUE}} is the asymptotic Stieltjes transform for GUE (which we secretly know to be given by (24), but let us pretend that we did not yet know this fact). Inserting this self-similar ansatz into (30) and setting {t=1}, we conclude that

\displaystyle  -\frac{1}{2} s_{GUE} - \frac{1}{2} z s'_{GUE} + s s'_{GUE} = 0;

multiplying by two and integrating, we conclude that

\displaystyle  z s_{GUE} + s_{GUE}^2 = C

for some constant {C}. But from the asymptotic (12) we see that {C} must equal {-1}. But then the above equation can be rearranged into (23), and so by repeating the arguments at the end of the previous section we can deduce the formula (24), which then gives the semi-circular law by (15).

As is well known in PDE, one can solve Burgers’ equation more generally by the method of characteristics. For reasons that will be come clearer in the next set of notes, I will solve this equation by a slightly different (but ultimately equivalent) method. The idea is that rather than think of {s=s(t,z)} as a function of {z} for fixed {t}, we think of {z=z(t,s)} as a function of {s} for fixed {t}. (This trick is sometimes known as the hodograph transform, especially if one views {s} as “velocity” and {z} as “position”.) Note from (12) that we expect to be able to invert the relationship between {s} and {z} as long as {z} is large (and {s} is small).

To exploit this change of perspective, we think of {s, z, t} as all varying by infinitesimal amounts {ds, dz, dt} respectively. Using (30) and the total derivative formula {ds = s_t dt + s_z dz}, we see that

\displaystyle  ds = - s s_z dt + s_z dz.

If we hold {s} fixed (i.e. {ds=0}), so that {z} is now just a function of {t}, and cancel off the {s_z} factor, we conclude that

\displaystyle  \frac{dz}{dt} = s.

Integrating this, we see that

\displaystyle  z(t,s) = z(0,s) + ts. \ \ \ \ \ (31)

This, in principle, gives a way to compute {s(t,z)} from {s(0,z)}. First, we invert the relationship {s=s(0,z)} to {z=z(0,s)}; then we add {ts} to {z(0,s)}; then we invert again to recover {s(t,z)}.

Since {M_t \equiv M_0 + \sqrt{t} G}, where {G} is a GUE matrix independent of {M_0}, we have thus given a formula to describe the Stieltjes transform of {M_0 + \sqrt{t} G} in terms of the Stieltjes transform of {M_0}. This formula is a special case of a more general formula of Voiculescu for free convolution, with the operation of inverting the Stieltjes transform essentially being the famous {R}-transform of Voiculescu; we will discuss this more in the next section.

Filed under: 254A - random matrices, math.AP, math.PR, math.SP Tagged: Burgers' equation, concentration of measure, Dyson Brownian motion, empirical spectral distribution, moment method, Stieltjes transform method, Wigner semi-circular law

February 06, 2010

Doug Natelson"Not my job!"

US Secretary of Energy Steven Chu is going to be on "Wait, Wait, Don't Tell Me" this coming Saturday, presumably doing their "Not my job!" game.  For those not in the US, WWDTM is a comedic radio quiz program, and "Not my job!" is a game in which the guest must answer three questions about some subject that is very, very far from their area of expertise.  This should be amusing.

Update:  Here is a link to the relevant part of the show.

John BaezThis Week's Finds in Mathematical Physics (Week 293)

John Baez

This week I want to list a bunch of recent papers and books on n-categories. Then I'll tell you about a conference on the math of environmental sustainability and green technology. And then I'll continue my story about electrical circuits. But first...

This column started with some vague dreams about n-categories and physics. Thanks to a lot of smart youngsters - and a few smart oldsters - these dreams are now well on their way to becoming reality. They don't need my help anymore! I need to find some new dreams. So, "week300" will be the last issue of This Week's Finds in Mathematical Physics.

I still like learning things by explaining them. When I start work at the Centre for Quantum Technologies this summer, I'll want to tell you about that. And I've realized that our little planet needs my help a lot more than the abstract structure of the universe does! The deep secrets of math and physics are endlessly engrossing - but they can wait, and other things can't. So, I'm trying to learn more about ecology, economics, and technology. And I'd like to talk more about those.

So, I plan to start a new column. Not completely new, just a bit different from this. I'll call it This Week's Finds, and drop the "in Mathematical Physics". That should be sufficiently vague that I can talk about whatever I want.

I'll make some changes in format, too. For example, I won't keep writing each issue in ASCII and putting it on the usenet newsgroups. Sorry, but that's too much work.

I also want to start a new blog, since the n-Category Cafe is not the optimal place for talking about things like the melting of Arctic ice. But I don't know what to call this new blog - or where it should reside. Any suggestions?

I may still talk about fancy math and physics now and then. Or even a lot. We'll see. But if you want to learn about n-categories, you don't need me. There's a lot to read these days. I mentioned Carlos Simpson's book in "week291" - that's one good place to start. Here's another introduction:

1) John Baez and Peter May, Towards Higher Categories, Springer, 2009. Also available at http://ncatlab.org/johnbaez/show/Towards+Higher+Categories

This has a bunch of papers in it, namely:

  • John Baez and Michael Shulman, Lectures on n-categories and cohomology.

  • Julia Bergner, A survey of (∞,1)-categories.

  • Simona Paoli, Internal categorical structures in homotopical algebra.

  • Stephen Lack, A 2-categories companion.

  • Lawrence Breen, Notes on 1- and 2-gerbes.

  • Ross Street, An Australian conspectus of higher categories.

After browsing these, you should probably start studying (∞,1)-categories, which are ∞-categories where all the n-morphisms for n > 1 are invertible. There are a few different approaches, but luckily they're nicely connected by some results described in Julia Bergner's paper. Two of the most important approaches are "Segal spaces" and "quasicategories". For the latter, start here:

2) Andre Joyal, The Theory of Quasicategories and Its Applications, http://www.crm.cat/HigherCategories/hc2.pdf

and then go here:

3) Jacob Lurie, Higher Topos Theory, Princeton U. Press, 2009. Also available at http://www.math.harvard.edu/~lurie/papers/highertopoi.pdf

This book is 925 pages long! Luckily, Lurie writes well. After setting up the machinery, he went on to use (∞,1)-categories to revolutionize algebraic geometry:

4) Jacob Lurie, Derived algebraic geometry I: stable infinity-categories, available as arXiv:math/0608228.
Derived algebraic geometry II: noncommutative algebra, available as arXiv:math/0702299.
Derived algebraic geometry III: commutative algebra, available as arXiv:math/0703204.
Derived algebraic geometry IV: deformation theory, available as arXiv:0709.3091.
Derived algebraic geometry V: structured spaces, available as arXiv:0905.0459.
Derived algebraic geometry VI: Ek algebras, available as arXiv:0911.0018.

For related work, try these:

5) David Ben-Zvi, John Francis and David Nadler, Integral transforms and Drinfeld centers in derived algebraic geometry available as arXiv:0805.0157.

6) David Ben-Zvi and David Nadler, The character theory of a complex group, available as arXiv:0904.1247.

Lurie is now using (∞,n)-categories to study topological quantum field theory. He's making precise and proving some old conjectures that James Dolan and I made:

7) Jacob Lurie, On the classification of topological field theories, available as arXiv:0905.0465.

Jonathan Woolf is doing it in a somewhat different way, which I hope will be unified with Lurie's work eventually:

8) Jonathan Woolf, Transversal homotopy theory, available as arXiv:0910.3322.

All this stuff is starting to transform math in amazing ways. And I hope physics, too - though so far, it's mainly helping us understand the physics we already have.

Meanwhile, I've been trying to figure out something else to do. Like a lot of academics who think about beautiful abstractions and soar happily from one conference to another, I'm always feeling a bit guilty, wondering what I could do to help "save the planet". Yes, we recycle and turn off the lights when we're not in the room. If we all do just a little bit... a little will get done. But surely mathematicians have the skills to do more!

But what?

I'm sure lots of you have had such thoughts. That's probably why Rachel Levy ran this conference last weekend:

9) Conference on the Mathematics of Environmental Sustainability and Green Technology, Harvey Mudd College, Claremont, California, Friday-Saturday, January 29-30, 2010. Organized by Rachel Levy.

Here's a quick brain dump of what I learned.

First, Harry Atwater of Caltech gave a talk on photovoltaic solar power:

10) Atwater Research Group, http://daedalus.caltech.edu/

The efficiency of silicon crystal solar cells peaked out at 24% in 2000. Fancy "multijunctions" get up to 40% and are still improving. But they use fancy materials like gallium arsenide, gallium indium phosphate, and so on. The world currently uses 13 terawatts of power. The US uses 3. But building just 1 terawatt of these fancy photovoltaics would use up more rare substances than we can get our hands on:

11) Gordon B. Haxel, James B. Hedrick, and Greta J. Orris, Rare earth elements - critical resources for high technology, US Geological Survey Fact Sheet 087-02, available at http://pubs.usgs.gov/fs/2002/fs087-02/

So, if we want solar power, we need to keep thinking about silicon and use as many tricks as possible to boost its efficiency.

There are some limits. In 1961, Shockley and Quiesser wrote a paper on the limiting efficiency of a solar cell. It's limited by thermodynamical reasons! Since anything that can absorb energy can also emit it, any solar cell also acts as a light-emitting diode, turning electric power back into light:

12) W. Shockley and H. J. Queisser, Detailed balance limit of efficiency of p-n junction solar cells, J. Appl. Phys. 32 (1961) 510-519.

13) Wikipedia, Schockley-Quiesser limit, http://en.wikipedia.org/wiki/Shockley%E2%80%93Queisser_limit

What are the tricks used to approach this theoretical efficiency? Multijunctions use layers of different materials to catch photons of different frequencies. The materials are expensive, so people use a lens to focus more sunlight on the photovoltaic cell. The same is true even for silicon - see the Umuwa Solar Power Station in Australia. But then the cells get hot and need to be cooled.

Roughening the surface of a solar cell promotes light trapping, by large factors! Light bounces around ergodically and has more chances to get absorbed and turned into useful power. There are theoretical limits on how well this trick works. But those limits were derived using ray optics, where we assume light moves in straight lines. So, we can beat those limits by leaving the regime where the ray-optics approximation holds good. In other words, make the surface complicated at length scales comparable to the wavelength at light.

For example: we can grow silicon wires from vapor! They can form densely packed structures that absorb more light:

14) B. M. Kayes, H. A. Atwater, and N. S. Lewis, Comparison of the device physics principles of planar and radial p-n junction nanorod solar cells, J. Appl. Phys. 97 (2005), 114302.

James R. Maiolo III, Brendan M. Kayes, Michael A. Filler, Morgan C. Putnam, Michael D. Kelzenberg, Harry A. Atwater and Nathan S. Lewis, High aspect ratio silicon wire array photoelectrochemical cells, J. Am. Chem. Soc. 129 (2007), 12346-12347.

Also, with such structures the charge carriers don't need to travel so far to get from the n-type material to the p-type material. This also boosts efficiency.

There are other tricks, still just under development. Using quasiparticles called "surface plasmons" we can adjust the dispersion relations to create materials with really low group velocity. Slow light has more time to get absorbed! We can also create "meta-materials" whose refractive index is really wacky - like n = -5!

I should explain this a bit, in case you don't understand. Remember, the refractive index of a substance is the inverse of the speed of light in that substance - in units where the speed of light in vacuum equals 1. When light passes from material 1 to material 2, it takes the path of least time - at least in the ray-optics approximation. Using this you can show Snell's law:

sin(θ1)/sin(θ2) = n2/n1

where ni is the index of refraction in the ith material and θi is the angle between the light's path and the line normal to the interface between materials:

Air has an index of refraction close to 1. Glass has an index of refraction greater than 1. So, when light passes from air to glass, it "straightens out": its path becomes closer to perpendicular to the air-glass interface. When light passes from glass to air, the reverse happens: the light bends more. But the sine of an angle can never exceed 1 - so sometimes Snell's law has no solution. Then the light gets stuck! More precisely, it's forced to bounce back into the glass. This is called "total internal reflection", and the easiest way to see it is not with glass, but water. Dive into a swimming pool and look up from below. You'll only see the sky in a limited disk. Outside that, you'll see total internal reflection.

Okay, that's stuff everyone learns in optics. But negative indices of refraction are much weirder! The light entering such a material will bend backwards.

Materials with a negative index of refraction also exhibit a reversed version of the ordinary Goos-Hänchen effect. In the ordinary version, light "slips" a little before reflecting during total internal reflection. The "slip" is actually a slight displacement of the light's wave crests from their expected location - a "phase slip". But for a material of negative refractive index, the light slips backwards. This allows for resonant states where light gets trapped in thin films. Maybe this can be used to make better solar cells.

Next, Kenneth Golden gave a talk on sea ice, which covers 7-10% of the ocean's surface and is a great detector of global warming. He's a mathematician at the University of Utah who also does measurements in the Arctic and Antarctic. If you want to go to math grad school without becoming a nerd - if you want to brave 70-foot swells, dig trenches in the snow and see emperor penguins - you want Golden as your advisor:

15) Ken Golden's website, http://www.math.utah.edu/~golden/

Salt gets incorporated into sea ice via millimeter-scale brine inclusions between ice platelets, forming a "dendritic platelet structure". Melting sea ice forms fresh water in melt ponds atop the ice, while the brine sinks down to form "bottom water" driving the global thermohaline conveyor belt. You've heard of the Gulf Stream, right? Well, that's just part of this story.

When it gets hotter, the Earth's poles get less white, so they absorb more light, making it hotter: this is "ice albedo feedback". Ice albedo feedback is largely controlled by melt ponds. So if you're interested in climate change, questions like the following become important: when do melt ponds get larger, and when do they drain out?

Sea ice is diminishing rapidly in the Arctic - much faster than all the existing climate models had predicted. In the Arctic, winter sea ice diminished in area by about 10% from 1978 to 2008. But summer sea ice diminished by about 40%! It took a huge plunge in 2007, leading to a big increase in solar heat input due to the ice albedo effect.


Time series of the percent difference in ice extent in March (the month of ice extent maximum) and September (the month of ice extent minimum) relative to the mean values for the period 1979-2000. Based on a least squares linear regression for the period 1979-2009, the rate of decrease for the March and September ice extents is -2.5% and -8.9% per decade, respectively. Figure from Perovich et al.

16) Donald K. Perovich, Jacqueline A. Richter-Menge, Kathleen F. Jones, and Bonnie Light, Sunlight, water, and ice: Extreme Arctic sea ice melt during the summer of 2007, Geophysical Research Letters, 35 (2008), L11501. Also available at http://www.crrel.usace.army.mil/sid/personnel/perovichweb/index1.htm

There's a lot less sea ice in the Antarctic than in the Arctic. Most of it is the Weddell Sea, and there it seems to be growing, maybe due to increased precipitation.

There's a lot of interesting math involved in understanding the dynamics of sea ice. The ice thickness distribution equation was worked out by Thorndike et al in 1975. The heat equation for ice and snow was worked out by Maykut and Understeiner in 1971. Sea ice dynamics was studied by Kibler.

Ice floes have two fractal regimes, one from 1 to 20 meters, another from 100 to 1500 meters. Brine channels have a fractal character well modeled by "diffusion limited aggregation". Brine starts flowing when there's about 5% of brine in the ice - a kind of percolation problem familiar in statistical mechanics. Here's what it looks like when there's 5.7% brine and the temperature is -8 °C:

17) Kenneth Golden, Brine inclusions in a crystal of lab-grown sea ice, http://www.math.utah.edu/~golden/7.html

Nobody knows why polycrystalline metals have a log-normal distribution of crystal sizes. Similar behavior, also unexplained, is seen in sea ice.

A "polynya" is an area of open water surrounded by sea ice. Polynyas occupy just .001% of the overall area in Antarctic sea ice, but create 1% of the icea. Icy cold katabatic winds blow off the mainland, pushing away ice and creating patches of open water which then refreeze.

There was anomalous export of sea ice through Fran Strait in the 1990s, which may have been one of the preconditions for high ice albedo feedback.

20-40% of sea ice is formed by surface flooding followed by refreezing. This was not included in the sea ice models that gave such inaccurate predictions.

The food chain is founded on diatoms. These form "extracellular polymeric substances"- goopy mucus-like stuff made of polysaccharides that protects them and serves as antifreeze. There's a lot of this stuff; the ice gets visibly stained by it.

For more, see:

18) Kenneth M. Golden, Climate change and the mathematics of transport in sea ice, AMS Notices, May 2009. Also available at http://www.ams.org/notices/200905/

19) Mathematics Awareness Month, April 2009: Mathematics and Climate, http://www.mathaware.org/mam/09/

Next, Julie Lundquist, who just moved from Lawrence Livermore Labs to the University of Colorado, spoke about wind power:

20) Julie Lunquist, Department of Atmospheric and Oceanic Sciences, University of Colorado, http://paos.colorado.edu/people/lundquist.php

With increased reliance on wind, the power grid will need to be redesigned to handle fluctuating power sources. In the US, currently, companies aren't paid for power they generate in excess of the amount they promised to make. So, accurate prediction is a hugely important game. Being off by 1% can cost millions of dollars! Europe has different laws, which encourage firms to maximize the amount of wind power they generate.

If you had your choice about where to build a wind turbine, you'd build it on the ocean or a very flat plain, where the air flows rather smoothly. Hilly terrain leads to annoying turbulence - but sometimes that's your only choice. Then you need to find the best spots, where the turbulence is least bad. Complete simulation of the Navier-Stokes equations is too computationally intensive, so people use fancier tricks. There's a lot of math and physics here.

For weather reports people use "mesoscale simulation" which cleverly treats smaller-scale features in an averaged way - but we need more fine-grained simulations to see how much wind a turbine will get. This is where "large eddy simulation" comes in. Eddy diffusivity is modeled by Monin-Obukhov similarity theory:

21) American Meteorological Society Glossary, Monin-Obukhov similarity theory, http://amsglossary.allenpress.com/glossary/search?id=monin-obukhov-similarity-theory1

A famous Brookhaven study suggested that the power spectrum of wind has peaks at 4 days, 1/2 day, and 1 minute. This perhaps justifies an approach where different time scales, and thus length scales, are treated separately and the results then combined somehow. The study is actually a bit controversial. But anyway, this is the approach people are taking, and it seems to work.

Night air is stable - but day air is often not, since the ground is hot, and hot air rises. So when a parcel of air moving along hits a hill, it can just shoot upwards, and not come back down! This means lots of turbulence.

The wind turbines at Altamont Pass in California kill more raptors than all other wind farms in the world combined! Old-fashioned wind turbines look like nice places to perch, spelling death to birds. Cracks in concrete attract rodents, which attract raptors, who get killed. The new ones are far better.

For more:

22) National Renewable Energy Laboratory, Research needs for winds resource characterization, available as http://www.nrel.gov/docs/fy08osti/43521.pdf

Finally, there was a talk by Ron Lloyd of Fat Spaniel Technologies. This is a company that makes software for solar plants and other sustainable energy companies:

23) Fat Spaniel Technologies, http://www.fatspaniel.com/products/

His talk was less technical so I didn't take detailed notes. One big point I took away was this: we need better tools for modelling! This is especially true with the coming of the "smart grid". In its simplest form, this is a power grid that uses lots of data - for example, data about power generation and consumption - to regulate itself and increase efficiency. Surely there will be a lot of math here. Maybe even the topic I've been talking about lately: bond graphs!

But now I want to talk about some very simple aspects of electrical circuits. Last week I listed various kinds of circuits. Now let's go into a bit more detail - starting with the simplest kind: circuits made of just wires and linear resistors, where the currents and voltages are independent of time.

Mathematically, such a circuit is a graph equipped with some extra data. First, each edge has a number associated to it - the "resistance". For example:

 o----1----o----3----o | | | | | | 2 3 2 | | | | | | o----3----o----1----o 
Second, we have current flowing through this circuit. To describe this, we first arbitrarily pick an orientation on each edge:
 o---->----o---->----o | | | | | | V V V | | | | | | o----<----o---->----o 
Then we label each edge with a number saying how much "current" is flowing through that edge, in the direction of the arrow:
 2 3 o---->----o---->----o | | | | | | 3 V V 1 V 3 | | | | | | o----<----o---->----o 2 -3 
Electrical engineers call the current I. Mathematically it's good to think of I as a "1-chain": a linear combination of oriented edges of our graph, with the coefficients of the linear combination being the numbers shown above.

If we know the current, we can work out a number for each vertex of our graph, saying how much current is flowing out of that vertex, minus how much is flowing in:

 2 5 o---->----o---->----o 0 | | | | | | V V V | | | | | | -5 o----<----o---->----o 0 -2 
Mathematically we can think of this as a "0-chain": a formal linear combination of the vertices of our graph, with the numbers shown above as coefficients. We call this 0-chain the "boundary" of the 1-chain we started with. Since our current was called I, we call its boundary δI.

Kirchhoff's current law says that

δI = 0

When this holds, let's say our circuit is a "closed". Physically this follows from the law of conservation of electrical charge, together with a reasonable assumption. Current is the flow of charge. If the total current flowing into a vertex wasn't equal to the amount flowing out, charge - positive or negative - would be building up there. But for a closed circuit, we assume it's not.

If a circuit is not closed, let's call it "open". These are interesting too. For example, we might have a circuit like this:

 x | | V | | o---->----o | | | | V V | | | | x x 
where we have current flowing in the wire on top and flowing out the two wires at bottom. We allow δI to be nonzero at the ends of these wires - the 3 vertices labelled x. This circuit is an "open system" in the sense of "week290", because it has these wires dangling out of it. It's not self-contained; we can use it as part of some bigger circuit. We should really formalize this more, but I won't now. Derek Wise did it more generally here:

24) Derek Wise, Lattice p-form electromagnetism and chain field theory, available as gr-qc/0510033.

The idea here was to get a category where chain complexes are morphisms. In our situation, composing morphisms amounts to gluing the output wires of one circuit into the input wires of another. This is an example of the general philosophy I'm trying to pursue, where open systems are treated as morphisms.

We've talked about 1-chains and 0-chains... but we can also back up and talk about 2-chains! Let's suppose our graph is connected - it is in our example - and let's fill it in with enough 2-dimensional "faces" to get something contractible. We can do this in a god-given way if our graph is drawn on the plane: just fill in all the holes!

 o---------o---------o |/////////|/////////| |/////////|/////////| |//FACE///|///FACE//| |/////////|/////////| |/////////|/////////| o---------o---------o 
In electrical engineering these faces are often called "meshes".

This give us a chain complex

 δ δ C0 <-------- C1 <-------- C2 
and a cochain complex:
 d d C0 --------> C1 ---------> C2 
As I've already said, it's good to think of the current I as a 1-chain, since then

δI = 0

is Kirchoff's current law. Since our little space is contractible the above equation implies that

I = δJ

for some 2-chain J called the "mesh current". This assigns to each face or "mesh" the current flowing around that face.

An electrical circuit also comes with a third piece of data, which I haven't mentioned yet. Each oriented edge should be labelled by a number called the "voltage" across that edge. Electrical engineers call the voltage V. It's good to think of V as a 1-cochain, which assigns to each edge the voltage across that edge.

Why a 1-cochain instead of a 1-chain? Because then

dV = 0

is the other basic law of electrical circuits - Kirchhoff's voltage law! This law says that the sum of these voltages around a mesh is zero. Since our little space is contractible the above equation implies that

V = dφ

for some 0-cochain φ called the "electrostatic potential". In electrostatics, this potential is a function on space. Here it assigns a number to each vertex of our graph.

Since the space of 1-cochains is the dual of the space of 1-chains, we can take the voltage V and the current I, glom them together, and get a number:

V(I)

This the "power": that is, the rate at which our network soaks up energy and dissipates it into heat. Note that this is just a fancy version of formula for power that I explained in "week290" - power is effort times flow.

I've given you three basic pieces of data labelling our circuit: the resistance R, the current I, and the voltage V. But these aren't independent! Ohm's law says that the voltage across any edge is the current through that times the resistance of that edge. But this remember: voltage is a 1-cochain while current is a 1-chain. So "resistance" can be thought of as a map from 1-cochains to 1-chains:

R: C1 → C1

This lets us write Ohm's law like this:

V = RI

This, in turn, means the power of our circuit is

V(I) = (RI)(I)

For physical reasons, this power is always nonnegative. In fact, let's assume it's positive unless I = 0. This is just another way of saying that resistance labelling each edge is positive. It can be very interesting to think about circuits with perfectly conducting wires. These would give edges whose resistance is zero. But that's a bit of an idealization, and right now I'd rather allow only positive resistances.

Why? Because then we can think of the above formula as the inner product of I with itself! In other words, then there's a unique inner product on 1-cochains with

(RI)(I) = <I,I>

In this situation

R: C1 → C1

is the usual isomorphism that we get between a finite-dimensional inner product space and its dual. (For this statement to be true, we'd better assume our graph has finitely many vertices and edges.)

Now, if you've studied de Rham cohomlogy, all this should start reminding you of Hodge theory. And indeed, it's a baby version of that! So, we're getting a little bit of Hodge theory, but in a setting where our chain complexes are really morphisms in a category. Or more generally, n-morphisms in an n-category.

There's a lot more to say, but that's enough for now.


Addenda: I thank Colin Backhurst, David Corfield, and Tim Silverman for corrections.

For more discussion, visit the n-Category Café.


So many young people are forced to specialize in one line or another that a young person can't afford to try and cover this waterfront - only an old fogy who can afford to make a fool of himself. If I don't, who will? - John Wheeler


© 2010 John Baez
baez@math.removethis.ucr.andthis.edu


-- Delivered by Feed43 service

Chad OrzelInheritance

It's been brought to my attention that there hasn't been any cute-baby video posted here for a while, so let me rectify that with a couple of clips. First, SteelyKid discovers that it's kind of difficult to fill Daddy's shoes:

For both our sakes, I hope those never fit her.

A clear indication of inheritance at work is the way she talks with her hands, as seen in this second clip:

Read the rest of this post... | Read the comments on this post...

Terence TaoUltralimit analysis, and quantitative algebraic geometry


I have blogged a number of times in the past about the relationship between finitary (or “hard”, or “quantitative”) analysis, and infinitary (or “soft”, or “qualitative”) analysis. One way to connect the two types of analysis is via compactness arguments (and more specifically, contradiction and compactness arguments); such arguments can convert qualitative properties (such as continuity) to quantitative properties (such as bounded), basically because of the fundamental fact that continuous functions on a compact space are bounded (or the closely related fact that sequentially continuous functions on a sequentially compact space are bounded).

A key stage in any such compactness argument is the following: one has a sequence {X_n} of “quantitative” or “finitary” objects or spaces, and one has to somehow end up with a “qualitative” or “infinitary” limit object {X} or limit space. One common way to achieve this is to embed everything inside some universal space and then use some weak compactness property of that space, such as the Banach-Alaoglu theorem (or its sequential counterpart). This is for instance the idea behind the Furstenberg correspondence principle relating ergodic theory to combinatorics; see for instance this post of mine on this topic.

However, there is a slightly different approach, which I will call ultralimit analysis, which proceeds via the machinery of ultrafilters and ultraproducts; typically, the limit objects {X} one constructs are now the ultraproducts (or ultralimits) of the original objects {X_\alpha}. There are two main facts that make ultralimit analysis powerful. The first is that one can take ultralimits of arbitrary sequences of objects, as opposed to more traditional tools such as metric completions, which only allow one to take limits of Cauchy sequences of objects. The second fact is Los’s theorem, which tells us that {X} is an elementary limit of the {X_\alpha} (i.e. every sentence in first-order logic which is true for the {X_\alpha} for {\alpha} large enough, is true for {X}). This existence of elementary limits is a manifestation of the compactness theorem in logic; see this earlier blog post for more discussion. So we see that compactness methods and ultrafilter methods are closely intertwined. (See also my earlier class notes for a related connection between ultrafilters and compactness.)

Ultralimit analysis is very closely related to nonstandard analysis. I already discussed some aspects of this relationship in an earlier post, and will expand upon it at the bottom of this post. Roughly speaking, the relationship between ultralimit analysis and nonstandard analysis is analogous to the relationship between measure theory and probability theory.

To illustrate how ultralimit analysis is actually used in practice, I will show later in this post how to take a qualitative infinitary theory – in this case, basic algebraic geometry – and apply ultralimit analysis to then deduce a quantitative version of this theory, in which the complexity of the various algebraic sets and varieties that appear as outputs are controlled uniformly by the complexity of the inputs. The point of this exercise is to show how ultralimit analysis allows for a relatively painless conversion back and forth between the quantitative and qualitative worlds, though in some cases the quantitative translation of a qualitative result (or vice versa) may be somewhat unexpected. In an upcoming paper of myself, Ben Green, and Emmanuel Breuillard (announced in the previous blog post), we will rely on ultralimit analysis to reduce the messiness of various quantitative arguments by replacing them with a qualitative setting in which the theory becomes significantly cleaner.

For sake of completeness, I also redo some earlier instances of the correspondence principle via ultralimit analysis, namely the deduction of the quantitative Gromov theorem from the qualitative one, and of Szemerédi’s theorem from the Furstenberg recurrence theorem, to illustrate how close the two techniques are to each other.

— 1. Ultralimit analysis —

In order to perform ultralimit analysis, we need to prepare the scene by deciding on three things in advance:

  • The standard universe {{\mathcal U}} of standard objects and spaces.
  • A distinction between ordinary objects, and spaces.
  • A choice of non-principal ultrafilter {\alpha_\infty \in \beta {\mathbb N} \backslash {\mathbb N}}.

We now discuss each of these three preparatory ingredients in turn.

We assume that we have a standard universe or superstructure {{\mathcal U}} which contains all the “standard” sets, objects, and structures that we ordinarily care about, such as the natural numbers, the real numbers, the power set of real numbers, the power set of the power set of real numbers, and so forth. For technical reasons, we have to limit the size of this universe by requiring that it be a set, rather than a class; thus (by Russell’s paradox), not all sets will be standard (e.g. {{\mathcal U}} itself will not be a standard set). However, in many areas of mathematics (particularly those of a “finitary” or at most “countable” flavour, or those based on finite-dimensional spaces such as {{\mathbb R}^d}), the type of objects considered in a field of mathematics can often be contained inside a single set {{\mathcal U}}. For instance, the class of all groups is too large to be a set. But in practice, one is only interested in, say, groups with an at most countable number of generators, and if one then enumerates these generators and considers their relations, one can identify each such group (up to isomorphism) to one in some fixed set of model groups. One can then take {{\mathcal U}} to be the collection of these groups, and the various objects one can form from these groups (e.g. power sets, maps from one group to another, etc.). Thus, in practice, the requirement that we limit the scope of objects to care about is not a significant limitation. (If one does not want to limit one’s scope in this fashion, one can proceed instead using the machinery of Grothendieck universes.)

It is important to note that while we primarily care about objects inside the standard universe {{\mathcal U}}, we allow ourselves to use objects outside the standard universe (but still inside the ambient set theory) whenever it is convenient to do so. The situation is analogous to that of using complex analysis to solve real analysis problems; one may only care about statements that have to do with real numbers, but sometimes it is convenient to introduce complex numbers within the proofs of such statements. (More generally, the trick of passing to some completion {\overline{{\mathcal U}}} of one’s original structure {{\mathcal U}} in order to more easily perform certain mathematical arguments is a common theme throughout modern mathematics.)

We will also assume that there is a distinction between two types of objects in this universe: spaces, which are sets that can contain other objects, and ordinary objects, which are all the objects that are not spaces. Thus, for instance, a group element would typically be considered an ordinary object, whereas a group itself would be a space that group elements can live in. It is also convenient to view functions {f: X \rightarrow Y} between two spaces as itself a type of ordinary object (namely, an element of a space {\hbox{Hom}(X,Y)} of maps from {X} to {Y}). The precise concept of what constitutes a space, and what constitutes an ordinary object, is somewhat hard to formalise, but the basic rule of thumb to decide whether an object {X} should be a space or not is to ask whether mathematical phrases such as {x \in X}, {f: X \rightarrow Y}, or {A \subset X} are likely to make useful sense. If so, then {X} is a space; otherwise, {X} is an ordinary object.

Examples of spaces include sets, groups, rings, fields, graphs, vector spaces, topological spaces, metric spaces, function spaces, measure spaces, dynamical systems, and operator algebras. Examples of ordinary objects include points, numbers, functions, matrices, strings, and equations.

Remark 1 Note that in some cases, a single object may seem to be both an ordinary object and a space, but one can often separate the two roles that this object is playing by making a sufficiently fine distinction. For instance, in Euclidean geometry, a line {\ell} in is both an ordinary object (it is one of the primitive concepts in that geometry), but it can also be viewed as a space of points. In such cases, it becomes useful to distinguish between the abstract line {\ell}, which is the primitive object, and its realisation {\ell[{\mathbb R}]} as a space of points in the Euclidean plane. This type of distinction is quite common in algebraic geometry, thus, for instance, the imaginary circle {C := \{ (x,y): x^2 + y^2 = -1 \}} has an empty realisation {C[{\mathbb R}] = \emptyset} in the real plane {{\mathbb R}^2}, but has a non-trivial realisation {C[{\mathbb C}]} in the complex plane {{\mathbb C}^2} (or over finite fields), and so we do not consider {C} (as an abstract algebraic variety) to be empty. Similarly, given a function {f}, we distinguish between the function {f} itself (as an abstract object) and the graph {f[X] := \{ (x,f(x)): x \in X \}} of that function over some given domain {X}.

We also fix a nonprincipal ultrafilter {\alpha_\infty} on the natural numbers. Recall that this is a collection of subsets of {{\mathbb N}} with the following properties:

  • No finite set lies in {\alpha_\infty}.
  • If {A \subset {\mathbb N}} is in {\alpha_\infty}, then any subset of {{\mathbb N}} containing {A} is in {\alpha_\infty}.
  • If {A, B} lie in {\alpha_\infty}, then {A \cap B} also lies in {\alpha_\infty}.
  • If {A \subset {\mathbb N}}, then exactly one of {A} and {{\mathbb N} \backslash A} lies in {\alpha_\infty}.

Given a property {P(\alpha)} which may be true or false for each natural number {\alpha}, we say that {P} is true for {\alpha} sufficiently close to {\alpha_\infty} if the set {\{ \alpha \in {\mathbb N}: P(\alpha) \hbox{ holds}\}} lies in {\alpha_\infty}. The existence of a non-principal ultrafilter {\alpha_\infty} is guaranteed by the ultrafilter lemma, which can be proven using the axiom of choice.

Remark 2 One can view {\alpha_\infty} as a point in the Stone-Čech compactification, in which case “for {\alpha} sufficiently close to {\alpha_\infty}” acquires the familiar topological meaning “for all {\alpha} in a neighbourhood of {\alpha_\infty}“.

We can use this ultrafilter to take limits of standard objects and spaces. Indeed, given any two sequences {(x_\alpha)_{\alpha \in {\mathbb N}}}, {(y_\alpha)_{\alpha \in {\mathbb N}}} of standard ordinary objects, we say that such sequences are equivalent if we have {x_\alpha = y_\alpha} for all {\alpha} sufficiently close to {\alpha_\infty}. We then define the ultralimit {\lim_{\alpha \rightarrow \alpha_\infty} x_\alpha} of a sequence {(x_\alpha)_{\alpha \in {\mathbb N}}} to be the equivalence class of {(x_\alpha)_{\alpha \in {\mathbb N}}} (in the space {{\mathcal U}^{\mathbb N}} of all sequences in the universe). In other words, we have

\displaystyle  \lim_{\alpha \rightarrow \alpha_\infty} x_\alpha = \lim_{\alpha \rightarrow \alpha_\infty} y_\alpha

if and only if {x_\alpha = y_\alpha} for all {\alpha} sufficiently close to {\alpha_\infty}.

The ultralimit {\lim_{\alpha \rightarrow \alpha_\infty} x_\alpha} lies outside the standard universe {{\mathcal U}}, but is still constructible as an object in the ambient set theory (because {{\mathcal U}} was assumed to be a set). Note that we do not need {x_\alpha} to be well-defined for all {\alpha} for the limit {(x_\alpha)_{\alpha \in {\mathbb N}}} to make sense; it is enough that {x_\alpha} is well-defined for all {\alpha} sufficiently close to {\alpha_\infty}.

If {x = \lim_{\alpha \rightarrow \alpha_\infty} x_\alpha}, we refer to the sequence {x_\alpha} of ordinary objects as a model for the limit {x}. Thus, any two models for the same limit object {x} will agree in a sufficiently small neighbourhood of {\alpha_\infty}.

Similarly, given a sequence of standard spaces {(X_\alpha)_{\alpha \in {\mathbb N}}}, one can form the ultralimit (or ultraproduct) {\lim_{\alpha \rightarrow \alpha_\infty} X_\alpha}, defined as the collection of all ultralimits {\lim_{\alpha \rightarrow \alpha_\infty} x_\alpha} of sequences {x_\alpha}, where {x_\alpha \in X_\alpha} for all {\alpha \in {\mathbb N}} (or for all {\alpha} sufficiently close to {\alpha_\infty}). Again, this space will lie outside the standard universe, but is still a set. (This will not conflict with the notion of ultralimits for ordinary objects, so long as one always takes care to keep spaces and ordinary objects separate.) If {X = \lim_{\alpha \rightarrow \alpha_\infty} X_\alpha}, we refer to the sequence {X_\alpha} of spaces as a model for {X}.

As a special case of an ultralimit, given a single space {X}, its ultralimit {\lim_{\alpha \rightarrow \alpha_\infty} X} is known as the ultrapower of {X} and will be denoted {{}^* X}.

Remark 3 One can view {{}^* X} as a type of completion of {X}, much as the reals are the metric completion of the rationals. Indeed, just as the reals encompass all limits {\lim_{n \rightarrow \infty} x_n} of Cauchy sequences {x_1,x_2,\ldots} in the rationals, up to equivalence, the ultrapower {{}^* X} encompass all limits of arbitrary sequences in {X}, up to agreement sufficiently close to {\alpha_\infty}. The ability to take limits of arbitrary sequences, and not merely Cauchy sequences or convergent sequences, is the underlying source of power of ultralimit analysis. (This ability ultimately arises from the universal nature of the Stone-Čech compactification {\beta {\mathbb N}}, as well as the discrete nature of {{\mathbb N}}, which makes all sequences {n \mapsto x_n} continuous.)

Of course, we embed the rationals into the reals by identifying each rational {x} with its limit {\lim_{n \rightarrow \infty} x}. In a similar spirit, we identify every standard ordinary object {x} with its ultralimit {\lim_{\alpha \rightarrow \alpha_\infty} x}. In particular, a standard space {X} is now identified with a subspace of {{}^* X}. When {X} is finite, it is easy to see that this embedding of {X} to {{}^* X} is surjective; but for infinite {X}, the ultrapower is significantly larger than {X} itself.

Remark 4 One could collect the ultralimits of all the ordinary objects and spaces in the standard universe {{\mathcal U}} and form a new structure, the nonstandard universe {\overline{{\mathcal U}}_{\alpha_\infty}}, which one can view as a completion of the standard universe, in much the same way that the reals are a completion of the rationals. However, we will not have to explicitly deal with this nonstandard universe and will not discuss it again in this post.

In nonstandard analysis, an ultralimit of standard ordinary object in a given class is referred to as (or more precisely, models) a nonstandard object in that class. To emphasise the slightly different philosophy of ultralimit analysis, however, I would like to call these objects limit objects in that class instead. Thus, for instance:

  • An ultralimit {n = \lim_{\alpha \rightarrow \alpha_\infty} n_\alpha} of standard natural numbers is a limit natural number (or a nonstandard natural number, or an element of {{}^* {\mathbb N}});
  • An ultralimit {x = \lim_{\alpha \rightarrow \alpha_\infty} x_\alpha} of standard real numbers is a limit real number (or a nonstandard real number, or a hyperreal, or an element of {{}^* {\mathbb R}});
  • An ultralimit {\phi = \lim_{\alpha \rightarrow \alpha_\infty} \phi_\alpha} of standard functions {\phi_\alpha: X_\alpha \rightarrow Y_\alpha} between two sets {X_\alpha,Y_\alpha} is a limit function (also known as an internal function, or a nonstandard function);
  • An ultralimit {\phi = \lim_{\alpha \rightarrow \alpha_\infty} \phi_\alpha} of standard continuous functions {\phi_\alpha: X_\alpha \rightarrow Y_\alpha} between two topological spaces {X_\alpha,Y_\alpha} is a limit continuous function (or internal continuous function, or nonstandard continuous function);
  • etc.

Clearly, all standard ordinary objects are limit objects of the same class, but not conversely.

Similarly, ultralimits of spaces in a given class will be referred to limit spaces in that class (in nonstandard analysis, they would be called nonstandard spaces or internal spaces instead). For instance:

  • An ultralimit {X = \lim_{\alpha \rightarrow \alpha_\infty} X_\alpha} of standard sets is a limit set (or internal set, or nonstandard set);
  • An ultralimit {G = \lim_{\alpha \rightarrow \alpha_\infty} G_\alpha} of standard groups is a limit group (or internal group, or nonstandard group);
  • An ultralimit {(X,{\mathcal B},\mu) = \lim_{\alpha \rightarrow \alpha_\infty} (X_\alpha,{\mathcal B}_\alpha,\mu_\alpha)} of standard measure spaces is a limit measure space (or internal measure space, or nonstandard measure space);
  • etc.

Note that finite standard spaces will also be limit spaces of the same class, but infinite standard spaces will not. For instance, {{\mathbb Z}} is a standard group, but is not a limit group, basically because it does not contain limit integers such as {\lim_{\alpha \rightarrow \alpha_\infty} \alpha}. However, {{\mathbb Z}} is contained in the limit group {{}^* {\mathbb Z}}. The relationship between standard spaces and limit spaces is analogous to that between incomplete spaces and complete spaces in various fields of mathematics (e.g. in metric space theory or field theory).

Any operation or result involving finitely many standard objects, spaces, and first-order quantifiers carries over to their nonstandard or limit counterparts (the formal statement of this is Los’s theorem). For instance, the addition operation on standard natural numbers gives an addition operation on limit natural numbers, defined by the formula

\displaystyle  \lim_{\alpha \rightarrow \alpha_\infty} n_\alpha + \lim_{\alpha \rightarrow \alpha_\infty} m_\alpha := \lim_{\alpha \rightarrow \alpha_\infty} (n_\alpha + m_\alpha).

It is easy to see that this is a well-defined operation on the limit natural numbers {{}^* {\mathbb N}}, and that the usual properties of addition (e.g. the associative and commutative laws) carry over to this limit (much as how the associativity and commutativity of addition on the rationals automatically implies the same laws of arithmetic for the reals). Similarly, we can define the other arithmetic and order relations on limit numbers: for instance we have

\displaystyle  \lim_{\alpha \rightarrow \alpha_\infty} n_\alpha \geq \lim_{\alpha \rightarrow \alpha_\infty} m_\alpha

if and only if {n_\alpha \geq m_\alpha} for all {\alpha} sufficiently close to {\alpha_0}, and similarly define {\leq, >, <}, etc. Note from the definition of an ultrafilter that we still have the usual order trichotomy: given any two limit numbers {n, m}, exactly one of {n < m}, {n=m}, and {n>m} is true.

Example 1 The limit natural number {\omega := \lim_{\alpha \rightarrow \alpha_\infty} \alpha} is larger than all standard natural numbers, but {\omega^2 = \lim_{\alpha \rightarrow \alpha_\infty} \alpha^2} is even larger still.

The following two exercises should give some intuition of how Los’s theorem is proved, and what it could be useful for:

Exercise 1 Show that the following two formulations of Goldbach’s conjecture are equivalent:

  • Every even natural number greater than two is the sum of two primes.
  • Every even limit natural number greater than two is the sum of two prime limit natural numbers.

Here, we define a limit natural number {n} to be even if we have {n=2m} for some limit natural number {m}, and a limit natural number {n} to be prime if it is greater than {1} but cannot be written as the product of two limit natural numbers greater than {1}.

Exercise 2 Let {k_\alpha} be a sequence of algebraically closed fields. Show that the ultralimit {k := \lim_{\alpha \rightarrow \alpha_\infty} k_\alpha} is also an algebraically closed field. In other words, every limit algebraically closed field is an algebraically closed field.

Given an ultralimit {\phi := \lim_{\alpha \rightarrow \alpha_\infty} \phi_\alpha} of functions {\phi_\alpha: X_\alpha \rightarrow Y_\alpha}, we can view {\phi} as a function from the limit space {X := \prod_{\alpha \rightarrow \alpha_\infty} X_\alpha} to the limit space {Y := \prod_{\alpha \rightarrow \alpha_\infty} Y_\alpha} by the formula

\displaystyle  \phi( \lim_{\alpha \rightarrow \alpha_\infty} x_\alpha ) := \lim_{\alpha \rightarrow \alpha_\infty} \phi_\alpha(x_\alpha).

Again, it is easy to check that this is well-defined. Thus every limit function from a limit space {X} to a limit space {Y} is a function from {X} to {Y}, but the converse is not true in general.

One can easily show that limit sets behave well with respect to finitely many boolean operations; for instance, the intersection of two limit sets {X = \lim_{\alpha \rightarrow \alpha_\infty} X_\alpha} and {Y = \lim_{\alpha \rightarrow \alpha_\infty} Y_\alpha} is another limit set, namely {X \cap Y = \lim_{\alpha \rightarrow \alpha_\infty} X_\alpha \cap Y_\alpha}. However, we caution that the same is not necessarily true for infinite boolean operations; the countable union or intersection of limit sets need not be a limit set. (For instance, each individual standard integer in {{\mathbb Z}} is a limit set, but their union {{\mathbb Z}} is not.) Indeed, there is an analogy between the limit subsets of a limit set, and the clopen subsets of a topological space (or the constructible sets in an algebraic variety).

By the same type of arguments used to show Exercise 2, one can check that every limit group is a group (albeit one that usually lies outside the standard universe {{\mathcal U}}), every limit ring is a ring, every limit field is a field, etc.

The situation with vector spaces is a little more interesting. The ultraproduct {V = \lim_{\alpha \rightarrow \alpha_\infty} V_\alpha} of a collection of standard vector spaces {V_\alpha} over {{\mathbb R}} is a vector space over the larger field {{}^* {\mathbb R}}, because the various scalar multiplication operations {\cdot_\alpha: {\mathbb R} \times V_\alpha \rightarrow V_\alpha} over the standard reals become a scalar multiplication operation {\cdot: {}^* {\mathbb R} \times V \rightarrow V} over the limit reals. Of course, as the standard reals {{\mathbb R}} are a subfield of the limit reals {{}^* {\mathbb R}}, {V} is also a vector space over the standard reals {{\mathbb R}}; but when viewed this way, the properties of the {V_\alpha} are not automatically inherited by {V}. For instance, if each of the {V_\alpha} are {d}-dimensional over {{\mathbb R}} for some fixed finite {d}, then {V} is {d}-dimensional over the limit reals {{}^* {\mathbb R}}, but is infinite dimensional over the reals {{\mathbb R}}.

Now let {A = \lim_{\alpha \rightarrow \alpha_\infty} A_\alpha} be a limit finite set, i.e. a limit of finite sets {A_\alpha}. Every finite set is a limit finite set, but not conversely; for instance, {\lim_{\alpha \rightarrow \alpha_\infty} \{1,\ldots,\alpha\}} is a limit finite set which has infinite cardinality. On the other hand, because every finite set {A_\alpha} has a cardinality {|A_\alpha| \in {\mathbb N}} which is a standard natural number, we can assign to every limit finite set {A = \lim_{\alpha \rightarrow \alpha_\infty} A_\alpha} a limit cardinality {|A| \in {}^* {\mathbb N}} which is a limit natural number, by the formula

\displaystyle  |\lim_{\alpha \rightarrow \alpha_\infty} A_\alpha| := \lim_{\alpha \rightarrow \alpha_\infty} |A_\alpha|.

This limit cardinality inherits all of the first-order properties of ordinary cardinality. For instance, we have the inclusion-exclusion formula

\displaystyle  |A \cup B| + |A \cap B| = |A| + |B|

for any two limit finite sets; this follows from the inclusion-exclusion formula for standard finite sets by an easy limiting argument.

It is not hard to show that {\lim_{\alpha \rightarrow \alpha_\infty} A_\alpha} is finite if and only if the {|A_\alpha|} are bounded for {\alpha} sufficiently close to {\alpha_\infty}. Thus, we see that one feature of passage to ultralimits is that it converts the term “bounded” to “finite”, while the term “finite” becomes “limit finite”. This makes ultralimit analysis useful for deducing facts about bounded quantities from facts about finite quantities. We give some examples of this in the next section.

In a similar vein, an ultralimit {(X,d) = \lim_{\alpha \rightarrow \alpha_\infty} (X_\alpha,d_\alpha)} of standard metric spaces {(X_\alpha,d_\alpha)} yields a limit metric space, thus for instance {d: X \times X \rightarrow {}^* {\mathbb R}} is now a metric taking values in the limit reals. Now, if the spaces {(X_\alpha,d_\alpha)} were uniformly bounded, then the limit space {(X,d)} would be bounded by some (standard) real diameter. From the Bolzano-Weierstrass theorem we see that every bounded limit real number {x} has a unique standard part {\hbox{st}(x)} which differs from {x} by an infinitesimal, i.e. a limit real number of the form {\lim_{\alpha \rightarrow \alpha_\infty} x_\alpha} where {x_\alpha} converges to zero in the classical sense. As a consequence, the standard part {\hbox{st}(d)} of the limit metric function {d: X \times X \rightarrow {}^* {\mathbb R}} is a genuine metric function {\hbox{st}(d): X \times X \rightarrow {\mathbb R}}. The resulting metric space {(X, \hbox{st}(d))} is often referred to as an ultralimit of the original metric spaces {(X_\alpha,d_\alpha)}, although strictly speaking this conflicts slightly with the notation here, because we consider {(X,d)} to be the ultralimit instead.

— 2. Application: quantitative algebraic geometry —

As a sample application of the above machinery, we shall use ultrafilter analysis to quickly deduce some quantitative (but not explicitly effective) algebraic geometry results from their more well-known qualitative counterparts. Significantly stronger results than the ones given here can be provided by the field of effective algebraic geometry, but that theory is somewhat more complicated than the classical qualitative theory, and the point I want to stress here is that one can obtain a “cheap” version of this effective algebraic geometry from the qualitative theory by a straightforward ultrafilter argument. I do not know of a comparably easy way to get such ineffective quantitative results without the use of ultrafilters or closely related tools (e.g. nonstandard analysis or elementary limits).

We first recall a basic definition:

Definition 1 (Algebraic set) An (affine) algebraic set over an algebraically closed field {k} is a subset of {k^n}, where {n} is a positive integer, of the form

\displaystyle  \{ x \in k^n: P_1(x) = \ldots = P_m(x) = 0 \} \ \ \ \ \ (1)

where {P_1,\ldots,P_m: k^n \rightarrow k} are a finite collection of polynomials.

Now we turn to the quantitative theory, in which we try to control the complexity of various objects. Let us say that an algebraic set in {k^n} has complexity at most {M} if {n \leq M}, and one can express the set in the form (1) where {m \leq M}, and each of the polynomials {P_1,\ldots,P_m} has degree at most {M}. We can then ask the question of to what extent one can make the above qualitative algebraic statements quantitative. For instance, it is known that a dimension {0} algebraic set is finite; but can we bound how finite it is in terms of the complexity {M} of that set? We are particularly interested in obtaining bounds here which are uniform in the underlying field {k}.

One way to do so is to open up an algebraic geometry textbook and carefully go through the proofs of all the relevant qualitative facts, and carefully track the dependence on the complexity. For instance, one could bound the cardinality of a dimension {0} algebraic set using Bézout’s theorem. But here, we will use ultralimit analysis to obtain such quantitative analogues “for free” from their qualitative counterparts. The catch, though, is that the bounds we obtain are ineffective; they use the qualitative facts as a “black box”, and one would have to go through the proof of these facts in order to extract anything better.

To begin the application of ultrafilter analysis, we use the following simple lemma.

Lemma 2 (Ultralimits of bounded complexity algebraic sets are algebraic) Let {n} be a dimension. Suppose we have a sequence of algebraic sets {A_\alpha \subset k_\alpha^n} over algebraically closed fields {k_\alpha}, whose complexity is bounded by a quantity {M} which is uniform in {\alpha}. Then if we set {k := \lim_{\alpha \rightarrow \alpha_\infty} k_\alpha} and {A :=\lim_{\alpha \rightarrow \alpha_\infty} A_\alpha}, then {k} is an algebraically closed field and {A \subset k^n} is an algebraic set (also of complexity at most {M}).

Conversely, every algebraic set in {k^n} is the ultralimit of algebraic sets in {k_\alpha^n} of bounded complexity.

Proof: The fact that {k} is algebraically closed comes from Exercise 2. Now we look at the algebraic sets {A_\alpha}. By adding dummy polynomials if necessary, we can write

\displaystyle  A_\alpha = \{ x \in k_\alpha^n: P_{\alpha,1}(x) = \ldots = P_{\alpha,M}(x) = 0 \}

where the {P_{\alpha,1},\ldots,P_{\alpha,M}: k_\alpha^n \rightarrow k_\alpha} of degree at most {M}.

We can then take ultralimits of the {P_{\alpha,i}} to create polynomials {P_{1},\ldots,P_{M}: k^n \rightarrow k} of degree at most {M}. One easily verifies on taking ultralimits that

\displaystyle  A = \{ x \in k^n: P_{1}(x) = \ldots = P_{M}(x) = 0 \}

and the first claim follows. The converse claim is proven similarly. \Box

Ultralimits preserve a number of key algebraic concepts (basically because such concepts are definable in first-order logic). We first illustrate this with the algebraic geometry concept of dimension. It is known that every non-empty algebraic set {V} in {k^n} has a dimension {\dim(V)}, which is an integer between {0} and {n}, with the convention that the empty set has dimension {-1}. There are many ways to define this dimension, but one way is to proceed by induction on the dimension {n} as follows. A non-empty algebraic subset of {k^0} has dimension {0}. Now if {n \geq 1}, we say that an algebraic set {V} has dimension {d} for some {0 \leq d \leq n} if the following statements hold:

  • For all but finitely many {t \in k}, the slice {V_t := \{ x \in k^{n-1}: (x,t) \in V \}} either all have dimension {d-1}, or are all empty.
  • For the remaining {t \in k}, the slice {V_t} has dimension at most {d}. If the generic slices {V_t} were all empty, then one of the exceptional {V_t} has to have dimension exactly {d}.

Informally, {A} has dimension {d} iff a generic slice of {A} has dimension {d-1}.

It is a non-trivial fact to show that every algebraic set in {k^n} does indeed have a well-defined dimension between {-1} and {n}.

Now we see how dimension behaves under ultralimits.

Lemma 3 (Continuity of dimension) Suppose that {A_\alpha \subset k_\alpha^n} are algebraic sets over various algebraically closed fields {k_\alpha} of uniformly bounded complexity, and let {A := \lim_{\alpha \rightarrow \alpha_\infty} A_\alpha} be the limiting algebraic set given by Lemma 2. Then {\dim(A) = \lim_{\alpha \rightarrow \alpha_\infty} \dim(A_\alpha)}. In other words, we have {\dim(A) = \dim(A_\alpha)} for all {\alpha} sufficiently close to {\alpha_\infty}.

Proof: One could obtain this directly from Los’s theorem, but it is instructive to do this from first principles.

We induct on dimension {n}. The case {n=0} is trivial, so suppose that {n \geq 1} and the claim has already been shown for {n-1}. Write {d} for the dimension of {A}. If {d=-1}, then {A} is empty and so {A_\alpha} must be empty for all {\alpha} sufficiently close to {\alpha_\infty}, so suppose that {d \geq 0}. By the construction of dimension, the slice {A_t} all have dimension {d-1} (or are all empty) for all but finitely many values {t_1,\ldots,t_r} of {t \in k}. Let us assume that these generic slices {A_t} all have dimension {d-1}; the other case is treated similarly and is left to the reader. As {k} is the ultralimit of the {k_\alpha}, we can write {t_i = \lim_{\alpha \rightarrow \alpha_\infty} t_{\alpha,i}} for each {1 \leq i \leq r}. We claim that for {\alpha} sufficiently close to {\alpha_\infty}, the slices {(A_\alpha)_{t_{\alpha}}} have dimension {d-1} whenever {t_{\alpha} \neq t_{\alpha,1},\ldots,t_{\alpha,r}}. Indeed, suppose that this were not the case. Carefully negating the quantifiers (and using the ultrafilter property), we see that for {\alpha} sufficiently close to {\alpha_\infty}, we can find {t_{\alpha} \neq t_{\alpha,1},\ldots,t_{\alpha,r}} such that {(A_\alpha)_{t_{\alpha}}} has dimension different from {d-1}. Taking ultralimits and writing {t := \lim_{\alpha \rightarrow \alpha_\infty} t_\alpha}, we see from the induction hypothesis that {A_t} has dimension different from {d-1}, contradiction.

We have shown that for {\alpha} sufficiently close to {\alpha_\infty}, all but finitely many slices of {A_\alpha} have dimension {d-1}, and thus by the definition of dimension, {A_\alpha} has dimension {d}, and the claim follows. \Box

We can use this to deduce quantitative algebraic geometry results from qualitative analogues. For instance, from the definition of dimension we have

Lemma 4 (Qualitative Bezout-type theorem) Every dimension {0} algebraic variety is finite.

Using ultrafilter analysis, we immediately obtain the following quantitative analogue:

Lemma 5 (Quantitative Bezout-type theorem) Let {A \subset k^n} be an algebraic set of dimension {0} and complexity at most {M} over a field {k}. Then the cardinality {A} is bounded by a quantity {C_M} depending only on {M} (in particular, it is independent of {k}).

Proof: By passing to the algebraic closure, we may assume that {k} is algebraically closed.

Suppose this were not the case. Carefully negating the quantifiers (and using the axiom of choice), we may find a sequence {A_\alpha \subset k_\alpha^n} of dimension {0} algebraic sets and uniformly bounded complexity over algebraically closed fields {k_\alpha}, such that {|A_\alpha| \rightarrow \infty} as {\alpha \rightarrow \infty}. We pass to an ultralimit to obtain a limit algebraic set {A := \lim_{\alpha \rightarrow \alpha_\infty} A_\alpha}, which by Lemma 3 has dimension {0}, and is thus finite by Lemma 4. But then this forces {A_\alpha} to be bounded for {\alpha} sufficiently close to {\alpha_\infty} (indeed we have {|A_\alpha| = |A|} in such a neighbourhood), contradiction. \Box

Remark 5 Note that this proof gives absolutely no bound on {C_M} in terms of {M}! One can get such a bound by using more effective tools, such as the actual Bezout theorem, but this requires more actual knowledge of how the qualitative algebraic results are proved. If one only knows the qualitative results as a black box, then the ineffective quantitative result is the best one can do.

Now we give another illustration of the method. The following fundamental result in algebraic geometry is known:

Lemma 6 (Qualitative Noetherian condition) There does not exist an infinite decreasing sequence of algebraic sets in a affine space {k^n}, in which each set is a proper subset of the previous one.

Using ultralimit analysis, one can convert this qualitative result into an ostensibly stronger quantitative version:

Lemma 7 (Quantitative Noetherian condition) Let {F: {\mathbb N} \rightarrow {\mathbb N}} be a function. Let {A_1 \supsetneq A_2 \supsetneq \ldots \supsetneq A_R} be a sequence of properly nested algebraic sets in {k^n} for some algebraically closed field {k}, such that each {A_i} has complexity at most {F(i)}. Then {R} is bounded by {C_F} for some {C_F} depending only on {F} (in particular, it is independent of {k}).

Remark 6 Specialising to the case when {F} is a constant {M}, we see that there is an upper bound on proper nested sequences of algebraic sets of bounded complexity; but the statement is more powerful than this because we allow {F} to be non-constant. Note that one can easily use this strong form of the quantitative Noetherian condition to recover Lemma 6 (why?), but if one only knew Lemma 7 in the constant case {F=M} then this does not obviously recover Lemma 6.

Proof: Note that {n} is bounded by {F(1)}, so it will suffice to prove this claim for a fixed {n}.

Fix {n}. Suppose the claim failed. Carefully negating all the quantifiers (and using the axiom of choice), we see that there exists an {F}, a sequence {k_\alpha} of algebraically closed fields, a sequence {R_\alpha} going to infinity, and sequences

\displaystyle  A_{\alpha,1} \supsetneq \ldots \supsetneq A_{\alpha,R_\alpha}

of properly nested algebraic sets in {k_\alpha^n}, with each {A_{\alpha,i}} having complexity at most {F(i)}.

We take an ultralimit of everything that depends on {\alpha}, creating an algebraically closed field {k = \lim_{\alpha \rightarrow \alpha_\infty} k_\alpha}, and an infinite sequence

\displaystyle  A_1 \supsetneq A_2 \supsetneq \ldots

of properly nested algebraic sets in {k^n}. (In fact, we could continue this sequence into a limit sequence up to the unbounded limit number {\lim_{\alpha \rightarrow \alpha_\infty} R_\alpha}, but we will not need this overspill here.) But this contradicts Lemma 6. \Box

Again, this argument gives absolutely no clue as to how {C_F} is going to depend on {F}. (Indeed, I would be curious to know what this dependence is exactly.)

Let us give one last illustration of the ultralimit analysis method, which contains an additional subtlety. Define an algebraic variety to be an algebraic set which is irreducible, which means that it cannot be expressed as the union of two proper subalgebraic sets. This notation is stable under ultralimits:

Lemma 8 (Continuity of irreducibility) Suppose that {A_\alpha \subset k_\alpha^n} are algebraic sets over various algebraically closed fields {k_\alpha} of uniformly bounded complexity, and let {A := \lim_{\alpha \rightarrow \alpha_\infty} A_\alpha} be the limiting algebraic set given by Lemma 2. Then {A} is an algebraic variety if and only if {A_\alpha} is an algebraic variety for all {\alpha} sufficiently close to {\alpha_\infty}.

However, this lemma is somewhat harder to prove than previous ones, because the notion of irreducibility is not quite a first order statement. The following exercises show the limit of what one can do without using some serious algebraic geometry:

Exercise 3 Let the notation and assumptions be as in Lemma 8. Show that if {A} is not an algebraic variety, then {A_\alpha} is a not algebraic variety for all {\alpha} sufficiently close to {\alpha_\infty}.

Exercise 4 Let the notation and assumptions be as in Lemma 8. Call an algebraic set {M}-irreducible if it cannot be expressed as the union of two proper algebraic sets of complexity at most {M}. Show that if {A} is an algebraic variety, then for every {M \geq 1}, {A_\alpha} is {M}-irreducible for all {\alpha} sufficiently close to {\alpha_\infty}.

These exercises are not quite strong enough to give Lemma 8, because {M}-irreducibility is a weaker concept than irreducibility. However, one can do better by applying some further facts in algebraic geometry. Given an algebraic set {A} of dimension {d \geq 0} in an affine space {k^n}, one can assign a degree {\deg(A)}, which is a positive integer such that {|A \cap V| = \deg(A)} for generic {n-d}-dimensional affine subspaces of {k^n}, which means that {V} belongs to the affine Grassmannian {Gr} of {n-d}-dimensional affine subspaces of {k^n}, after removing an algebraic subset of {Gr} of dimension strictly less than that of {Gr}. It is a standard fact of algebraic geometry that every algebraic set can be assigned a degree. Somewhat less trivially, the degree controls the complexity:

Theorem 9 (Degree controls complexity) Let {A} be an algebraic variety of {k^n} of degree {D}. Then {A} has complexity at most {C_{n,D}} for some constants {n, D} depending only on {n, D}.

Proof: (We thank Jordan Ellenberg and Ania Otwinowska for this argument.) It suffices to show that {A} can be cut out by polynomials of degree {D}, since the space of polynomials of degree {D} that vanish on {A} is a vector space of dimension bounded only by {n} and {D}.

Let {A} have dimension {d}. We pick a generic affine subspace {V} of {k^n} of dimension {n-d-2}, and consider the cone {C(V,A)} formed by taking all the union of all the lines joining a point in {V} to a point in {A}. This is an algebraic image of {V \times A \times {\mathbb R}} and is thus generically an algebraic set of dimension {n-1}, i.e. a hypersurface. Furthermore, as {A} has degree {D}, it is not hard to see that {C(V,A)} has degree {D} as well. Since a hypersurface is necessarily cut out by a single polynomial, this polynomial must have degree {D}.

To finish the claim, it suffices to show that the intersection of the {C(V,A)} as {V} varies is exactly {A}. Clearly, this intersection contains {A}. Now let {p} be any point not in {A}. The cone of {A} over {p} can be viewed as an algebraic subset of the projective space {P^{n-1}} of dimension {d}; meanwhile, the cone of a generic subspace {V} of dimension {n-d-2} is a generic subspace of {P^{n-1}} of the same dimension. Thus, for generic {V}, these two cones do not intersect, and thus {p} lies outside {C(V,A)}, and the claim follows. \Box

Remark 7 There is a stronger theorem that asserts that if the degree of a scheme in {k^n} is bounded, then the complexity of that scheme is bounded as well. The main difference between a variety and a scheme here is that for a scheme, we not only specify the set of points cut out by the scheme, but also the ideal of functions that we want to think of as vanishing on that set. This theorem is significantly more difficult than the above result; it is Corollary 6.11 of Kleiman’s SGA6 article.

Given this theorem, we can now prove Lemma 8.

Proof: In view of Exercise 3, it suffices to show that if {A} is irreducible, then the {A_\alpha} are irreducible for {\alpha} sufficiently close to {\alpha_0}.

The algebraic set {A} has some dimension {d} and degree {D}, thus {|A \cap V| = D} for generic affine {n-d}-dimensional subspaces {V} of {k^n}. Undoing the limit using Lemma 2 and Lemma 3 (adapted to the Grassmannian {Gr} rather than to affine space), we see that for {\alpha} sufficiently close to {\alpha_0}, {|A_\alpha \cap V_\alpha| = D} for generic affine {n-d}-dimensional subspaces {V_\alpha} of {k_\alpha^n}. In other words, {A_\alpha} has degree {D}, and thus by Theorem 9, any algebraic variety of {A_\alpha} of the same dimension {d} as {A_\alpha} will have complexity bounded by {C_{n,D}} uniformly in {\alpha}. Let {B_\alpha} be a {d}-dimensional algebraic subvariety of {A_\alpha}, and let {B} be the ultralimit of the {B_\alpha}. Then by Lemma 2, Lemma 3 and the uniform complexity bound, {B} is a {d}-dimensional algebraic subset of {A}, and thus must equal all of {A} by irreducibility of {A}. But this implies that {B_\alpha=A_\alpha} for all {\alpha} sufficiently close to {\alpha_0}, and the claim follows. \Box

We give a sample application of this result. From the Noetherian condition we easily obtain

Lemma 10 (Qualitative decomposition into varieties) Every algebraic set can be expressed as a union of finitely many algebraic varieties.

Using ultralimit analysis, we can make this quantitative:

Lemma 11 (Quantitative decomposition into varieties) Let {A \subset k^n} be an algebraic set of complexity at most {M} over an algebraically closed field {k}. Then {A} can be expressed as the union of at most {C_M} algebraic varieties of complexity at most {C_M}, where {C_M} depends only on {M}.

Proof: As {n} is bounded by {M}, it suffices to prove the claim for a fixed {n}.

Fix {n} and {M}. Suppose the claim failed. Carefully negating all the quantifiers (and using the axiom of choice), we see that there exists a sequence {A_\alpha\subset k_\alpha^n} of uniformly bounded complexity, such that {A_\alpha} cannot be expressed as the union of at most {\alpha} algebraic varieties of complexity at most {\alpha}. Now we pass to an ultralimit, obtaining a limit algebraic set {A \subset k^n}. As discussed earlier, {A} is an algebraic set over an algebraically closed field and is thus expressible as the union of a finite number of algebraic varieties {A_1,\ldots,A_m}. By Lemma 2 and Lemma 8, each {A_i} is an ultralimit of algebraic varieties {A_{\alpha,i}} of bounded complexity. The claim follows. \Box

— 3. Application: Quantitative Gromov theorem —

As a further illustration, I’ll redo an application of the correspondence principle from a previous post of mine. The starting point is the following famous theorem of Gromov:

Theorem 12 (Qualitative Gromov theorem) Every finitely generated group of polynomial growth is virtually nilpotent.

Let us now make the observation (already observed in Gromov’s original paper) that this theorem implies (and is in fact equivalent to) a quantitative version:

Theorem 13 (Quantitative Gromov theorem) For every {C, d} there exists {R} such that if {G} is generated by a finite set {S} with the growth condition {|B_S(r)| \leq Cr^d} for all {1 \leq r \leq R}, then {G} is virtually nilpotent, and furthermore it has a nilpotent subgroup of step and index at most {M_{C,d}} for some {M_{C,d}} depending only on {C,d}. Here {B_S(r)} is the ball of radius {r} generated by the set {S}.

Proof: We use ultralimit analysis. Suppose this theorem failed. Carefully negating the quantifiers, we find that there exists {C, d}, as well as a sequence {G_\alpha} of groups generated by a finite set {S_\alpha} such that {|B_{S_\alpha}(r)| \leq C r^d} for all {1 \leq r \leq \alpha}, and such that {G_\alpha} does not contain any nilpotent subgroup of step and index at most {\alpha}.

Now we take ultralimits, setting {G := \lim_{\alpha \rightarrow \alpha_\infty} G_\alpha} and {S := \lim_{\alpha \rightarrow \alpha_\infty} S_\alpha}. As the {S_\alpha} have cardinality uniformly bounded (by {Cr^1}), {S} is finite. The set {S} need not generate {G}, but it certainly generates some subgroup {\langle S \rangle} of this group. Since {|B_{S_\alpha}(r)| \leq C r^d} for all {\alpha} and all {1 \leq r \leq \alpha}, we see on taking ultralimits that {|B_S(r)| \leq Cr^d} for all {r}. Thus {\langle S \rangle} is of polynomial growth, and is thus virtually nilpotent.

Now we need to undo the ultralimit, but this requires a certain amount of preparation. We know that {\langle S \rangle} contains a finite index nilpotent subgroup {G'}. As {\langle S \rangle} is finitely generated, the finite index subgroup {G'} is also. (Proof: for {R} large enough, {B_S(R)} will intersect every coset of {G'}. As a consequence, one can describe the action of {\langle S\rangle} on the finite set {\langle S \rangle/G'} using only knowledge of {B_S(2R+1) \cap G'}. In particular, {B_S(2R+1) \cap G'} generates a finite index subgroup. Increasing {R}, the index of this subgroup is non-increasing, and thus must eventually stabilise. At that point, we generate all of {G'}.) Let {S'} be a set of generators for {G'}. Since {G'} is nilpotent of some step {s}, all commutators of {S'} of length at least {s+1} vanish.

Writing {S'} as an ultralimit of {S'_\alpha}, we see that the {S'_\alpha} are finite subsets of {G_\alpha} which generate some subgroup {G'_\alpha}. Since all commutators of {S'} of length at least {s+1} vanish, the same is true for {S'_\alpha} for {\alpha} close enough to {\alpha_\infty}, and so {G'_\alpha} is nilpotent for such {\alpha} with step bounded uniformly in {\alpha}.

Finally, if we let {R} be large enough that {B_S(R)} intersects every coset of {G'}, then we can cover {B_S(R+1)} by a product of {B_S(R)} and some elements of {G'} (which are of course finite products of elements in {S'} and their inverses). Undoing the ultralimit, we see that for {\alpha} sufficiently close to {\alpha_\infty}, we can cover {B_{S_\alpha}(R+1)} by the product of {B_{S_\alpha}(R)} and some elements of {G'_\alpha}. Iterating this we see that we can cover all of {G_\alpha} by {B_{S_\alpha}(R)} times {G'_\alpha}, and so {G'_\alpha} has finite index bounded uniformly in {\alpha}. But this contradicts the construction of {G_\alpha}. \Box

Remark 8 As usual, the argument gives no effective bound on {M_{C,d}}. Obtaining such an effective bound is in fact rather non-trivial; see this paper of Yehuda Shalom and myself for further discussion.

— 4. Application: Furstenberg correspondence principle —

Let me now redo another application of the correspondence principle via ultralimit analysis. We will begin with the following famous result of Furstenberg:

Theorem 14 (Furstenberg recurrence theorem) Let {(X, {\mathcal B}, \mu, T)} be a measure-preserving system, and let {A \subset X} have positive measure. Let {k \geq 1}. Then there exists {r > 0} such that {A \cap T^r A \cap \ldots \cap T^{(k-1)r} A} is non-empty.

We then use this theorem and ultralimit analysis to derive the following well-known result of Szemerédi:

Theorem 15 (Szemerédi’s theorem) Every set of integers of positive upper density contains arbitrarily long arithmetic progressions.

Proof: Suppose this were not the case. Then there exists {k \geq 1} and a set {A} of positive upper density with no progressions of length {k}. Unpacking the definition of positive upper density, this means that there exists {\delta > 0} and a sequence {N_\alpha \rightarrow \infty} such that

\displaystyle  |A \cap [-N_\alpha, N_\alpha]| \geq \delta |[-N_\alpha, N_\alpha]|

for all {\alpha}. We pass to an ultralimit, introducing the limit natural number {N := \lim_{\alpha \rightarrow \alpha_\infty} N_\alpha} and using the ultrapower {{}^* A =\lim_{\alpha \rightarrow \alpha_\infty} A} (note that {A} is a space, not an ordinary object). Then we have

\displaystyle  |{}^*A \cap [-N, N]| \geq \delta |[-N, N]|

where the cardinalities are in the limit sense. Note also that {{}^*A} has no progressins of length {k}.

Consider the space of all boolean combinations of shifts {{}^* A + r} of {{}^* A}, where {r} ranges over (standard) integers, thus for instance

\displaystyle  ({}^* A + 3) \cap ({}^* A + 5) \backslash ({}^* A - 7)

would be such a set. We call such sets definable sets. We give each such definable set {B} a limit measure

\displaystyle  \mu(B) := |B \cap [-N,N]| / [-N,N].

This measure takes values in the limit interval {{}^*[0,1]} and is clearly a finitely additive probability measure. It is also nearly translation invariant in the sense that

\displaystyle  \mu(B+k) = \mu(B) + o(1)

for any standard integer {k}, where {o(1)} is an infinitesimal (i.e. a limit real number which is smaller in magnitude than any positive standard real number). In particular, the standard part {st(\mu)} of {\mu} is a finitely additive standard probability measure. Note from construction that {st(\mu)(A) \geq \delta}.

Now we convert this finitely additive measure into a countably additive one. Let {2^{\mathbb Z}} be the set of all subsets {B} of the integers. This is a compact metrisable space, which we endow with the Borel {\sigma}-algebra {{\mathcal B}} and the standard shift {T: B \mapsto B+1}. The Borel {\sigma}-algebra is generated by the clopen sets in this space, which are boolean combinations of {T^r E}, where {E} is the basic cylinder set {E := \{ B \in 2^{\mathbb Z}: 0 \in B \}}. Each clopen set can be assigned a definable set in {{}^* {\mathbb Z}} by mapping {T^r E} to {{}^* A + r} and then extending by boolean combinations. The finitely additive probability measure {st(\mu)} on definable sets then pulls back to a finitely additive probability measure {\nu} on clopen sets in {2^{\mathbb Z}}. Applying the Carathéodory extension theorem (taking advantage of the compactness of {2^{\mathbb Z}}), we can extend this finitely additive measure to a countably additive Borel probability measure.

By construction, {\nu(E) \geq \delta > 0}. Applying Theorem 14, we can find {r > 0} such that {E \cap T^r E \cap \ldots \cap T^{(k-1)r} E} is non-empty. This implies that {{}^* A \cap ({}^* A + r) \cap \ldots \cap ({}^* A + (k-1)r)} is non-empty, and so {{}^* A} contains an arithmetic progression of length {k}, a contradiction. \Box

Note that the above argument is nearly identical to the usual proof of the correspondence principle, which uses Prokhorov’s theorem instead of ultrafilters.

— 5. Relationship with nonstandard analysis —

Ultralimit analysis is extremely close to, but subtly different from, nonstandard analysis, because of a shift of emphasis and philosophy. The relationship can be illustrated by the following table of analogies:

Digits Strings of digits Numbers
Symbols Strings of symbols Sentences
Set theory Finite von Neumann ordinals Peano arithmetic
Rational numbers {{\mathbb Q}} {\overline{{\mathbb Q}}} Real numbers {{\mathbb R}}
Real analysis Analysis on {\overline{{\mathbb R}}} Complex analysis
{{\mathbb R}} {{\mathbb R}^2} Euclidean plane geometry
{{\mathbb R}} Coordinate chart atlases Manifolds
{{\mathbb R}} Matrices Linear transformations
Algebra Sheaves of rings Schemes
Deterministic theory Measure theory Probability theory
Probability theory Von Neumann algebras Noncommutative probability theory
Classical mechanics Hilbert space mechanics Quantum mechanics
Finitary analysis Asymptotic analysis Infinitary analysis
Combinatorics Correspondence principle Ergodic theory
Quantitative analysis Compactness arguments Qualitative analysis
Standard analysis Ultralimit analysis Nonstandard analysis

(Here {\overline{{\mathbb R}}} is the algebraic completion of the reals, but {\overline{{\mathbb Q}}} is the metric completion of the rationals.)

In the first column one has a “base” theory or concept, which implicitly carries with it a certain ontology and way of thinking, regarding what objects one really cares to study, and what objects really “exist” in some mathematical sense. In the second column one has a fancier theory than the base theory (typically a “limiting case”, a “generalisation”, or a “completion” of the base theory), but one which still shares a close relationship with the base theory, in particular largely retaining the ontological and conceptual mindset of that theory. In the third column one has a new theory, which is modeled by the theories in the middle column, but which is not tied to that model, or to the implicit ontology and viewpoint carried by that model. For instance, one can think of a complex number as an element of the algebraic completion of the reals, but one does not have to, and indeed in many parts of complex analysis or complex geometry one wants to ignore the role of the reals as much as possible. Similarly for other rows of the above table. See for instance these lecture notes of mine for further discussion of the distinction between measure theory and probability theory.

[The relationship between the second and third columns of the above table is also known as the map-territory relation.]

Returning to ultralimit analysis, this is a type of analysis which still shares close ties with its base theory, standard analysis, in that all the objects one considers are either standard objects, or ultralimits of such objects (and similarly for all the spaces one considers). But more importantly, one continues to think of nonstandard objects as being ultralimits of standard objects, rather than having an existence which is largely independent of the concept of base theory of standard analysis. This perspective is reversed in nonstandard analysis: one views the nonstandard universe as existing in its own right, and the fact that the standard universe can be embedded inside it is a secondary feature (albeit one which is absolutely essential if one is to use nonstandard analysis in any nontrivial manner to say something new about standard analysis). In nonstandard analysis, ultrafilters are viewed as one tool in which one can construct the nonstandard universe from the standard one, but their role in the subject is otherwise minimised. In contrast, the ultrafilter {\alpha_\infty} plays a prominent role in ultralimit analysis.

In my opinion, none of the three columns here are inherently “better” than the other two; but they do work together quite well. In particular, the middle column serves as a very useful bridge to carry results back and forth between the worlds of the left and right columns.

Filed under: expository, math.AG, math.LO Tagged: algebraic sets, algebraic varieties, Bezout's theorem, nonstandard analysis, ultrafilters, ultralimit analysis

Ars MathematicaWhat is a Horn Clause?

Now to the actual definition of Horn clause. First, some standard logical terminology. A term is simply an expression built out of variables and function symbols. For example, x y-1 z is a term in the language of groups. An atomic formula is a formula that consists of relation symbols (including equality) applied to terms. So xy = yx is an example of an atomic formula in the language of groups. What makes an atomic formula atomic is that it’s not built out of smaller logical formulas.

A Horn clause is built out of atomic formulas in a particular way. Let A_1, … A_n and B
be atomic formulas. Then a Horn clause is a logical formula of the form

A_1 and … and A_n implies B.

As a degenerate special case, the left-hand side of the implication can be empty, which is the same as asserting formula B holds unconditionally.

Clifford JohnsonCategorically Not! - Grand Challenges!

So yes, the Categorically Not! series was a bit thin on the ground in the last several months. I think KC was a bit busy travelling to tell people about her Frank Oppenheimer book. Well, it is back on the calendar, and I probably should have mentioned it earlier, but the next one is tomorrow, so I thought I'd remind you. Remember that the series of events is held at the Santa Monica Art Studios, (with occasional exceptions). It's a series - started and run by science writer K. C. Cole - of fun and informative conversations deliberately ignoring the traditional boundaries between art, science, humanities, and other subjects. I strongly encourage you to come to them if you're in the area. Here is the website that describes past ones, and upcoming ones. See also the links at the end of the post for some announcements and descriptions (and even video) of previous events. The theme this month is Grand Challenges!. Here's the description from K. C. Cole: [...]

Chad OrzelWhy Does E=mc2? by Brian Cox and Jeff Forshaw

I want to like this book more than I do.

As a general matter, this is exactly the sort of science book we need more of. As you can probably guess from the title, Why Does E=mc2? sets out to explain Einstein's theory of relativity, and does an excellent job of it. It presents a clear and concise explanation of the theory for a non-scientific audience, using no math beyond the Pythagorean Theorem.

I picked this up partly as research of a sort-- if there is ever a How to Teach Physics to Your Dog 2: Canine Boogaloo, the most obvious topic for it would be relativity, which I mention a few times, but don't discuss in any detail. I was thinking about how that would work, and picked this up to see how they went about explaining things. I don't think I've encountered a better explanation of the physics, which they explain entirely with a geometric picture of spacetime, that makes a great deal more sense than most of the mathematical approaches I've encountered in my professional education.

And yet...

Read the rest of this post... | Read the comments on this post...

Secret Blogging SeminarWhen fine just ain’t enough


If you use sheaves to study differential geometry, one of the basic lemmas you’ll want is the following: Let X be a smooth manifold and let \mathcal{E} be a sheaf of modules over C^{\infty}(X). (For example, \mathcal{E} might be the sheaf of sections of a vector bundle.) Then all higher sheaf cohomology of \mathcal{E} vanishes.

The proof of this theorem is basically homological algebra plus the existence of partitions of unity. This gives rise to a slogan “when you have partitions of unity, sheaf cohomology vanishes.” One way to make this definition precise is through the technology of fine sheaves.

As Wikipedia says today, “[f]ine sheaves are usually only used over paracompact Hausdorff spaces”. That means they are not used when working with the Zariski topology on schemes, for example. When I started digging into this, I realized there were good reasons: The technology of fine sheaves (and the closely related technology of soft sheaves) does not include the scheme theory cases which we would want it to.

However, there are theorems of the form “when you have partitions of unity, sheaf cohomology vanishes” on schemes and on complex manifolds. I put up a question at MathOverflow asking whether there were better formulations that included these examples, but I probably didn’t formulate it well. I think spelling out all my issues would be too discursive for MathOverflow, so I’m bringing it over here.

What does it mean to vanish?

Let me start with a technical point that caused me a great deal of confusion. Let X be a topological space, U an open subset of X, and x be a point in U. Let \mathcal{E} be a sheaf of abelian groups on X and f a section in \mathcal{E}(U). In what sense could we say that f vanishes at x?

In this generality, there is only one reasonable definition: That the image of f in the stalk \mathcal{E}_x is zero. Unpacking the definition of the stalk, this means that there is an open set V, with x \in V \subset U, such that f|_V=0.

Now, think about the case where \mathcal{E} is a sheaf of functions on X, with restriction meaning honest-to-God-restriction of functions. The above definition is not what we mean when we say f vanishes at x! Rather, it is the concept we would express as “f vanishes on a neighborhood of x.”

In order to get a concept which generalizes the ordinary meaning of vanishing at a point, we need to restrict to the case where X is a locally ringed space, and \mathcal{E} a sheaf of \mathcal{O}_X-modules. In that case, \mathcal{E}_x is a module over the local ring \mathcal{O}_x. And the image of f in \mathcal{E}_x \otimes_{\mathcal{O}_x} k(x) is the best analogue to “the value of f at x“, where k(x) is the residue field of \mathcal{O}_x.

Therefore, in this blog post, I make the following definitions:

With the above notation, I say that f vanishes on a neighborhood of x if the image of f in \mathcal{E}_x is zero or, equivalently, if there is an open set V \ni x such that f|_V=0.

I say that f vanishes at x if the image of f in \mathcal{E}_x \otimes_{\mathcal{O}_x} k(x) is zero.

Let K be a closed subset of U. We have the following, analogous definitions:

The function f vanishes on a neighborhood of K if either of the equivalent definitions holds: (1) There is an open set V containing K, such that f|_U is zero or (2) For every x \in K, the function f vanishes in a neighborhood of x.

The function f vanishes on K if, for every x \in K, the function f vanishes in at x.

If more books had adopted this terminology, I would have spent far less time confused about exactly what they meant when they claimed some space had partitions of unity.

Partitions of unity implies vanishing sheaf cohomology, the standard version

With these definitions out of the way, we can show that the existence of partitions of unity implies vanishing sheaf cohomology.

Theorem 1: Let (X, \mathcal{O}) be a locally ringed space, and assume that X is paracompact. (Every cover has a locally finite subcover refinement.) Suppose that, for every open cover U_i of X, there are global functions f_i so that f_i vanishes in a neighborhood of X \setminus U_i and \sum f_i=1. Let \mathcal{E} be any sheaf of \mathcal{O}_X modules. Then H^i(X, \mathcal{E})=0 for all i >0.

Proof Sketch: Our proof is by induction on i. Let \mathcal{I} be an injective sheaf with an injection \mathcal{E} \to \mathcal{I}; and let \mathcal{K} be the cokernel of \mathcal{E} \to \mathcal{I}. For i \geq 2, the long exact sequence gives H^{i}(\mathcal{E}) = H^{i-1}(\mathcal{K}), the right hand side of which is zero by induction. So we simply must establise the base case, that H^1(\mathcal{E})=0.

We know that H^0(\mathcal{I}) \to H^0(\mathcal{K}) \to H^1(\mathcal{E}) \to 0 is exact, so it is enough to show H^0(\mathcal{I}) \to H^0(\mathcal{K}) is surjective. Let k be a global section of \mathcal{K}. Since \mathcal{I} \to \mathcal{K} is surjective, there is an open cover U_i of X, and functions h_i \in \mathcal{I}(U_i) such that h_i \mapsto k|_{U_i}.

Take f_i as in the hypothesis. For each index i, let V_i be an open set containing X \setminus U_i such that f_i|_{V_i} vanishes. Let m_i be the section of \mathcal{I} which is f_i h_i on U_i and is 0 on V_i. Such a section exists by the gluing axiom, applied to the open cover \{ U_i, V_i \}. Let m = \sum m_i. (By paracompactness, we may assume that the cover \{ U_i \} is locally finite, so the sum makes sense.) We claim that m \mapsto k.

It is enough to check this claim on stalks. Near any point u, we have m = \sum m_i = \sum_{U_i \ni u} m_i|_{U_i} = \sum_{U_i \ni u} f_i h_i. By construction, this maps to  \sum_{U_i \ni u} f_i k = \left( \sum f_i \right) k = k. QED

I would feel guilty if I never defined a fine sheaf in this post. The idea of fine sheaves is that, rather than starting with the sheaf of rings \mathcal{O}, we can start with the sheaf \mathcal{E} and define \mathcal{O}:= \mathcal{H}om(\mathcal{E}, \mathcal{E}). The sheaf \mathcal{E} is called fine if \mathcal{O}, defined in this manner, has partitions of unity in the above sense. Of course, \mathcal{O} may not be commutative and the stalks of \mathcal{O} may not be local, but it turns out that we can still prove Theorem 1 in this setting: A fine sheaf on a paracompact space has no cohomology. Unfortunately, in the examples I discuss below, the extra elements of \mathcal{H}om(\mathcal{E}, \mathcal{E}) still don’t create partitions of unity.

The Regularity Trick

The above proof asks for f_i to vanish on a neighborhood of X \setminus U_i. When X is nice enough, we can ask that f_i just vanish on X \setminus U_i.

Theorem 2 Let X be a paracompact regular topological space and (X, \mathcal{O}) be a locally ringed space. Suppose that, for any open cover V_i, there are global functions f_i such that f_i vanishes on X \setminus V_i and \sum f_i=1. Then, for any open cover U_i, there are global functions f_i so that f_i vanishes in a neighborhood of X \setminus U_i and \sum f_i=1.

Proof sketch: Take your open cover U_i. For every point u in X, let U_i contain u. Choose disjoint open sets V_i and W_i such that u \in V_i and X \setminus U_i \subset W_i. Find functions f_i such that \sum f_i=1 and f_i vanishes on X \setminus V_i. Then X will vanish on W_i and hence on a neighborhood of X \setminus U_i. QED

Because of the above argument, mathematicians who work on metrizable spaces don’t worry very much about the distinction between vanishing on a closed set and in the neighborhood of a closed set. But the Zariski topology is not metrizable…

The Zariski world: Cause for caution!

Let us begin, right away, by pointing out that there are affine schemes and sheaves of modules on them which have nontrivial cohomology.

Let X be the scheme \mathrm{Spec} k[x]. Let \mathcal{E} be the following sheaf: If 0 \in U then \mathcal{E}(U) is the local ring \mathcal{O}_0, otherwise \mathcal{E}(U)=0. The obvious map \mathcal{O}_X \to \mathcal{E} is a surjection of sheaves (exercise!), yet the map on global sections is not surjective. So, if \mathcal{F} is the kernel of \mathcal{O}_X \to \mathcal{E}, then H^1(X, \mathcal{F}) \neq 0.

So any theorem about sheaf cohomology vanishing on affine spaces needs to be phrased carefully.

The Zariski world: Cause for hope!

Affine schemes, with the Zariski topology, do not have partitions of unity in the sense of Theorem 1. Indeed, there do not exist two polynomials on the affine line adding to 1, the first of which is zero on a neighborhood of 0 and the other on a neighborhood of 1. (Since any polynomial which is zero in a neighborhood of a point must be identically zero.)

Nonetheless, we have the following theorem, originally due to Serre:

Theorem 3: (Hartshorne III.3.5, EGA III.1.3.1) Coherent sheaves on an affine scheme have no cohomology.*

Proof Sketch: As before, we reduce to the case of showing that, if \mathcal{I} \to \mathcal{F} is a surjective map of coherent sheaves, then H^0(\mathcal{I}) \to H^0(\mathcal{K}) is surjective.

Let k be a global section of \mathcal{K}. Let U_i be a basic open cover of X. The adjective basic means that U_i = \{ x : f_i(x) \neq 0 \} for some global function f_i. Let h_i \in \mathcal{I}(U_i) be a preimage of k. There is some n_i such that f_i^{n_i} h_i extends to a section of \mathcal{I}. (Exercise! This is the point that the coherence hypothesis is used.)

Since the U_i are a cover; the functions f_i have no common zero, and the functions f_i^{n_i} also have none. So, by the Nullstellansatz, there are global functions g_i such that \sum f_i^{n_i} g_i =1. So we can find
global sections m_i extending f_i^{n_i} g_i h_i and, as in the proof of Theorem 1, \sum m_i is a preimage of k. QED

This proof used two important facts. In order to avoid the language of basic opens, I’ll phrase them in terms of ideal sheaves; the reader might enjoy rephrasing the above proof in this language.

Fact 1: Let U be an open set in X, let \mathcal{E} be a coherent sheaf and h a section in \mathcal{E}(U). Let I \subset \mathcal{O}(X) be the ideal of f such that fh extends to X. Then Z(I) \subset X \setminus U.

Fact 2: If \mathcal{I}_1, \mathcal{I}_2, …, \mathcal{I}_r is a collection of coherent ideal sheaves on X such that \bigcap Z(\mathcal{I}_j)=\emptyset then \sum \mathcal{I}_j = \mathcal{O}.

Fact 2, to my mind, is a good generalization of the existence of partitions of unity. It is weaker than asking for partitions of unity in the sense of vanishing on neighborhoods of closed sets, but stronger than just asking for partitions of unity in the sense of vanishing on closed sets. I had hoped that the correct generalization of “partitions of unity implies sheaf cohomology vanishing” would be “Facts 1 and 2 imply vanishing of cohomology for coherent sheaves”. But, when I started reading about complex manifolds, I realized this was not the way to go.

The Stein World: Cause for puzzlement!

A Stein space is a closed complex-submanifold of \mathbb{C}^n.
They are the analogues of (smooth) affine schemes for complex analysis. We can talk about the sheaf \mathcal{O} of holomorphic functions on any Stein space.

Stein spaces have Fact 2; this is a consequence of Rückert’s Nullstellansatz. I am willing to consider this a good generalization of the existence of partitions of unity. Of course, Stein Spaces don’t have partitions of unity in the sense of Theorem 1 for the same reason polynomials don’t: An analytic function that vanishes on a neighborhood of a point must be identically 0.

Stein spaces also have Theorem 3. This is called Cartan’s Theorem B.

But Stein spaces don’t have Fact 1! Consider X = \mathbb{C}, let U = \mathbb{C} \setminus \{ 0 \}. Consider the section h=e^{1/z} of \mathcal{O}(U). There is no holomorphic function f such that fh is holomorphic! So we can’t use Fact 2 to prove Theorem 3.

I assumed that there was some minor trick which was used to get around this. But I just read through the proof of Cartan’s Theorem B in Grauert and Remmert’s Theory of Stein Spaces and it looks nothing like the proofs of Theorems 1 and 3.

This is where I run out of ideas. But I know we have readers who think about sheaves and homological methods on a much deeper level than I do. So, what is the version of “Partitions of unity imply cohomology vanishing” which works for Stein Spaces?

* Two footnotes for experts: Yes, this also holds for quasi-coherent sheaves. I stated the weaker version because I want to make the analogy to Stein spaces, and I’m not sure if the corresponding result is true for quasi-coherent on Stein spaces. Second, I am implicitly assuming noetherianness, in order to make sure my sums are finite. But the theorem is true without this.

Chad OrzelUpcoming Appearances: How to Teach Physics to Your Dog Live

sm_cover_draft_atom.jpgA couple of things happening in the next week, for those who would like some How to Teach Physics to Your Dog.

On the radio side, I am scheduled for an interview at 6:30 this Tuesday, Feb. 9, on KSOO's Viewpoint University. If you don't happen to be in the Sioux Falls, SD area, they do have a "Listen Live" button on their web page.

On the live-action side of things, I will be at Boskone next weekend, and am scheduled to sign books at 1pm Saturday, and to do a reading at 9:30 am Sunday. I realize that's sort of early in con world, so to make it worth your while to get up that early, I plan to read one previously unreleased dog conversation (from a chapter that got cut for length). It's even got a joke that is appropriate to the con's special guest.

I'm also on a couple of non-book-related panels, but I'll do a separate post about those.

On an unrelated note, the Albany Times Union article from a couple of weeks ago has been posted on the web, at least for the moment. Look quickly, I have no idea how long it will be freely available.

Read the comments on this post...

Jonathan Shockone step behind

A busy week filled with the likes of Mozart and Mahler, Lin, Lunin and Maldacena, a touch of hanzi, some couchsurfing, plenty of mathematica, pondering over entropy and density matrices, causality and Greens functions, a little genetics from MIT (an incredible online course that I would recommend to anybody who wants to know the fundamentals of modern genetics and more besides - Eric Lander is an inspiring lecturer!), organising conferences, postdoctoral seminar groups and journal clubs, and a little more besides. Consequently as I continue with an admixture of the above today I shall leave you with a link to an article I wrote for a Taiwanese online magazine after a former couchsurfer asked if I could contribute something related to new results on planetary formation models. Given that I'm no expert on this subject I figured it best to discuss only the basics of the result and to focus on what generalities can be drawn about the scientific approach as a whole. The article can be found here.

P.S. A public acknowledgment of a linguistic weakness: I'm finally up to 1000 individual Chinese characters (that's taken me around a year's serious study with three previous years studying passivly). This so far adds up to around 2000-3000 composite words, but there are still a few similar characters that in simplified Chinese are so close I keep tripping up on these basics. Today I've been struggling withand . (cheng (completed), huo (maybe), xian (salty)). These three along with 同,间,向,何,问,珂 are my current personal Chinese demons.

Quantum DiariesNishina center.

The theoretical physics laboratory of Riken, to which I belong to, is a part of Nishina accelerator center. The center has a huge accelerator which is specialized for “RIBF”, RI beam factory, with a world-”strongest” superconducting ring cyclotron. You can find movies on how isotopes are accelerated in the beam lines, at the webpage of the Nishina center. My research is on superstring theory but I am currently applying string theory technique to nuclear physics, so this Nishina center is a perfect place for me to get in touch with real nuclear experiments. SRCblack800

Last week, hosted by the director of Nishina center, we had a big party, with alcohol drinks and cakes. This was a get-together party, which the director aimed to have all of the center members to know each other. I eventually enjoyed this party since, as the director aimed, I have met one person who is a visiting experimentalist working in Italy. Her experiments sound very interesting to me, and in fact quite much related to my recent work on strange physics. We talked at the party, and we made a promise that we would get together sometime soon. However, to tell you the truth, I haven’t expected much on this promise, as this was at a party and we have met just for the first time, and I am just a string theorist who should look apparently “different” from nuclear experimentalists. However, on the next day, in the morning, she came to my room! — and we had a good discussion. It was amazing to me that, just at this get-together party, I happend to see an interesting person and could talk really on my project, although she came from the other side of the globe. I thank Nishina center, and the director En’yo.

I hope to report on the progress of my research, on the application of superstring theory to nuclear physics, here. As for my current project, my Mathematica says “I need more memory”…. well, I’ll try to write a new and beautiful code which may cost less memory, hopefully.

John BaezThis Week's Finds in Mathematical Physics (Week 293)

John Baez

This week I want to list a bunch of recent papers and books on n-categories. Then I'll tell you about a conference on the math of environmental sustainability and green technology. And then I'll continue my story about electrical circuits. But first...

This column started with some vague dreams about n-categories and physics. Thanks to a lot of smart youngsters - and a few smart oldsters - these dreams are now well on their way to becoming reality. They don't need my help anymore! I need to find some new dreams. So, "week300" will be the last issue of This Week's Finds in Mathematical Physics.

I still like learning things by explaining them. When I start work at the Centre for Quantum Technologies this summer, I'll want to tell you about that. And I've realized that our little planet needs my help a lot more than the abstract structure of the universe does! The deep secrets of math and physics are endlessly engrossing - but they can wait, and other things can't. So, I'm trying to learn more about ecology, economics, and technology. And I'd like to talk more about those.

So, I plan to start a new column. Not completely new, just a bit different from this. I'll call it This Week's Finds, and drop the "in Mathematical Physics". That should be sufficiently vague that I can talk about whatever I want.

I'll make some changes in format, too. For example, I won't keep writing each issue in ASCII and putting it on the usenet newsgroups. Sorry, but that's too much work.

I also want to start a new blog, since the n-Category Cafe is not the optimal place for talking about things like the melting of Arctic ice. But I don't know what to call this new blog - or where it should reside. Any suggestions?

I may still talk about fancy math and physics now and then. Or even a lot. We'll see. But if you want to learn about n-categories, you don't need me. There's a lot to read these days. I mentioned Carlos Simpson's book in "week291" - that's one good place to start. Here's another introduction:

1) John Baez and Peter May, Towards Higher Categories, Springer, 2009. Also available at http://ncatlab.org/johnbaez/show/Towards+Higher+Categories

This has a bunch of papers in it, namely:

  • John Baez and Michael Shulman, Lectures on n-categories and cohomology.

  • Julia Bergner, A survey of (∞,1)-categories.

  • Simona Paoli, Internal categorical structures in homotopical algebra.

  • Stephen Lack, A 2-categories companion.

  • Lawrence Breen, Notes on 1- and 2-gerbes.

  • Ross Street, An Australian conspectus of higher categories.

After browsing these, you should probably start studying (∞,1)-categories, which are ∞-categories where all the n-morphisms for n > 1 are invertible. There are a few different approaches, but luckily they're nicely connected by some results described in Julia Bergner's paper. Two of the most important approaches are "Segal spaces" and "quasicategories". For the latter, start here:

2) Andre Joyal, The Theory of Quasicategories and Its Applications, http://www.crm.cat/HigherCategories/hc2.pdf

and then go here:

3) Jacob Lurie, Higher Topos Theory, Princeton U. Press, 2009. Also available at http://www.math.harvard.edu/~lurie/papers/highertopoi.pdf

This book is 925 pages long! Luckily, Lurie writes well. After setting up the machinery, he went on to use (∞,1)-categories to revolutionize algebraic geometry:

4) Jacob Lurie, Derived algebraic geometry I: stable infinity-categories, available as arXiv:math/0608228.
Derived algebraic geometry II: noncommutative algebra, available as arXiv:math/0702299.
Derived algebraic geometry III: commutative algebra, available as arXiv:math/0703204.
Derived algebraic geometry IV: deformation theory, available as arXiv:0709.3091.
Derived algebraic geometry V: structured spaces, available as arXiv:0905.0459.
Derived algebraic geometry VI: Ek algebras, available as arXiv:0911.0018.

For related work, try these:

5) David Ben-Zvi, John Francis and David Nadler, Integral transforms and Drinfeld centers in derived algebraic geometry available as arXiv:0805.0157.

6) David Ben-Zvi and David Nadler, The character theory of a complex group, available as arXiv:0904.1247.

Lurie is now using (∞,n)-categories to study topological quantum field theory. He's making precise and proving some old conjectures that James Dolan and I made:

7) Jacob Lurie, On the classification of topological field theories, available as arXiv:0905.0465.

Jonathan Woolf is doing it in a somewhat different way, which I hope will be unified with Lurie's work eventually:

8) Jonathan Woolf, Transversal homotopy theory, available as arXiv:0910.3322.

All this stuff is starting to transform math in amazing ways. And I hope physics, too - though so far, it's mainly helping us understand the physics we already have.

Meanwhile, I've been trying to figure out something else to do. Like a lot of academics who think about beautiful abstractions and soar happily from one conference to another, I'm always feeling a bit guilty, wondering what I could do to help "save the planet". Yes, we recycle and turn off the lights when we're not in the room. If we all do just a little bit... a little will get done. But surely mathematicians have the skills to do more!

But what?

I'm sure lots of you have had such thoughts. That's probably why Rachel Levy ran this conference last weekend:

9) Conference on the Mathematics of Environmental Sustainability and Green Technology, Harvey Mudd College, Claremont, California, Friday-Saturday, January 29-30, 2010. Organized by Rachel Levy.

Here's a quick brain dump of what I learned.

First, Harry Atwater of Caltech gave a talk on photovoltaic solar power:

10) Atwater Research Group, http://daedalus.caltech.edu/

The efficiency of silicon crystal solar cells peaked out at 24% in 2000. Fancy "multijunctions" get up to 40% and are still improving. But they use fancy materials like gallium arsenide, gallium indium phosphate, and so on. The world currently uses 13 terawatts of power. The US uses 3. But building just 1 terawatt of these fancy photovoltaics would use up more rare substances than we can get our hands on:

11) Gordon B. Haxel, James B. Hedrick, and Greta J. Orris, Rare earth elements - critical resources for high technology, US Geological Survey Fact Sheet 087-02, available at http://pubs.usgs.gov/fs/2002/fs087-02/

So, if we want solar power, we need to keep thinking about silicon and use as many tricks as possible to boost its efficiency.

There are some limits. In 1961, Shockley and Quiesser wrote a paper on the limiting efficiency of a solar cell. It's limited by thermodynamical reasons! Since anything that can absorb energy can also emit it, any solar cell also acts as a light-emitting diode, turning electric power back into light:

12) W. Shockley and H. J. Queisser, Detailed balance limit of efficiency of p-n junction solar cells, J. Appl. Phys. 32 (1961) 510-519.

13) Wikipedia, Schockley-Quiesser limit, http://en.wikipedia.org/wiki/Shockley%E2%80%93Queisser_limit

What are the tricks used to approach this theoretical efficiency? Multijunctions use layers of different materials to catch photons of different frequencies. The materials are expensive, so people use a lens to focus more sunlight on the photovoltaic cell. The same is true even for silicon - see the Umuwa Solar Power Station in Australia. But then the cells get hot and need to be cooled.

Roughening the surface of a solar cell promotes light trapping, by large factors! Light bounces around ergodically and has more chances to get absorbed and turned into useful power. There are theoretical limits on how well this trick works. But those limits were derived using ray optics, where we assume light moves in straight lines. So, we can beat those limits by leaving the regime where the ray-optics approximation holds good. In other words, make the surface complicated at length scales comparable to the wavelength at light.

For example: we can grow silicon wires from vapor! They can form densely packed structures that absorb more light:

14) B. M. Kayes, H. A. Atwater, and N. S. Lewis, Comparison of the device physics principles of planar and radial p-n junction nanorod solar cells, J. Appl. Phys. 97 (2005), 114302.

Also, with such structures the charge carriers don't need to travel so far to get from the n-type material to the p-type material. This also boosts efficiency.

There are other tricks, still just under development. Using quasiparticles called "surface plasmons" we can adjust the dispersion relations to create materials with really low group velocity. Slow light has more time to get absorbed! We can also create "meta-materials" whose refractive index is really wacky - like n = -5!

I should explain this a bit, in case you don't understand. Remember, the refractive index of a substance is the inverse of the speed of light in that substance - in units where the speed of light in vacuum equals 1. When light passes from material 1 to material 2, it takes the path of least time - at least in the ray-optics approximation. Using this you can show Snell's law:

sin(theta1)/sin(theta2) = n2/n1

where ni is the index of refraction in the ith material and thetai is the angle between the light's path and the line normal to the interface between materials:

Air has an index of refraction close to 1. Glass has an index of refraction greater than 1. So, when light passes from light to glass, it "straightens out": its path becomes closer to perpendicular to the air-glass interface. When light passes from glass to air, the reverse happens: the light bends more. But the sine of an angle can never exceed 1 - so sometimes Snell's law has no solution. Then the light gets stuck! More precisely, it's forced to bounce back into the glass. This is called "total internal reflection", and the easiest way to see it is not with glass, but water. Dive into a swimming pool and look up from below. You'll only see the sky in a limited disk. Outside that, you'll see total internal reflection.

Okay, that's stuff everyone learns in optics. But negative indices of refraction are much weirder! The light entering such a material will bend backwards.

Materials with a negative index of refraction also exhibit a reversed version of the ordinary Goos-Hänchen effect. In the ordinary version, light "slips" a little before reflecting during total internal reflection. The "slip" is actually a slight displacement of the light's wave crests from their expected location - a "phase slip". But for a material of negative refractive index, the light slips backwards. This allows for resonant states where light gets trapped in thin films. Maybe this can be used to make better solar cells.

Next, Kenneth Golden gave a talk on sea ice, which covers 7-10% of the ocean's surface and is a great detector of global warming. He's a mathematician at the University of Utah who also does measurements in the Arctic and Antarctic. If you want to go to math grad school without becoming a nerd - if you want to brave 70-foot swells, dig trenches in the snow and see Emperor penguins - you want Golden as your advisor:

15) Ken Golden's website, http://www.math.utah.edu/~golden/

Salt gets incorporated into sea ice via millimeter-scale brine inclusions between ice platelets, forming a "dendritic platelet structure". Melting sea ice forms fresh water in melt ponds atop the ice, while the brine sinks down to form "bottom water" driving the global thermohaline conveyor belt. You've heard of the Gulf Stream, right? Well, that's just part of this story.

When it gets hotter, the Earth's poles get less white, so they absorb more light, making it hotter: this is "ice albedo feedback". Ice albedo feedback is largely controlled by melt ponds. So if you're interested in climate change, questions like the following become important: when do melt ponds get larger, and when do they drain out?

Sea ice is diminishing rapidly in the Arctic - much faster than all the existing climate models had predicted. There's a lot less sea ice in the Antarctic, mainly in the Wedell Sea, and there it seems to be growing, maybe due to increased precipitation. In the Arctic, winter sea ice diminished in area by about 10% from 1978 to 2008. But summer sea ice diminished by about 40%! It took a huge plunge in 2007, leading to a big increase in solar heat input due to the ice albedo effect.


Time series of the percent difference in ice extent in March (the month of ice extent maximum) and September (the month of ice extent minimum) relative to the mean values for the period 1979-2000. Based on a least squares linear regression for the period 1979-2009, the rate of decrease for the March and September ice extents is -2.5% and -8.9% per decade, respectively. Figure from Perovich et al.

16) Donald K. Perovich, Jacqueline A. Richter-Menge, Kathleen F. Jones, and Bonnie Light, Sunlight, water, and ice: Extreme Arctic sea ice melt during the summer of 2007, Geophysical Research Letters, 35 (2008), L11501. Also available at http://www.crrel.usace.army.mil/sid/personnel/perovichweb/index1.htm

There's a lot of interesting math involved in understanding the dynamics of sea ice. The ice thickness distribution equation was worked out by Thorndike et al in 1975. The heat equation for ice and snow was worked out by Maykut and Understeiner in 1971. Sea ice dynamics was studied by Kibler.

Ice floes have two fractal regimes, one from 1 to 20 meters, another from 100 to 1500 meters. Brine channels have a fractal character well modeled by "diffusion limited aggregation". Brine starts flowing when there's about 5% of brine in the ice - a kind of percolation problem familiar in statistical mechanics. Here's what it looks like when there's 5.7% brine:

17) Kenneth Golden, Brine inclusions in a crystal of lab-grown sea ice, http://www.math.utah.edu/~golden/7.html

Nobody knows why polycrystalline metals have a log-normal distribution of crystal sizes. Similar behavior, also unexplained, is seen in sea ice.

A "polynya" is an area of open water surrounded by sea ice. Polynyas occupy just .001% of the overall area in Antarctic sea ice, but create 1% of the icea. Icy cold catabatic winds blow off the mainland, pushing away ice and creating patches of open water which then refreeze.

There was anomalous export of sea ice through Fran Strait in the 1990s, which may have been one of the preconditions for high ice albedo feedback.

20-40% of sea ice is formed by surface flooding followed by refreezing. This was not included in the sea ice models that gave such inaccurate predictions.

The food chain is founded on diatoms. These form "extracellular polymeric substances"- goopy mucus-like stuff made of polysaccharides that protects them and serves as antifreeze. There's a lot of this stuff; the ice gets visibly stained by it.

For more, see:

18) Kenneth M. Golden, Climate change and the mathematics of transport in sea ice, AMS Notices, May 2009. Also available at http://www.ams.org/notices/200905/

19) Mathematics Awareness Month, April 2009: Mathematics and Climate, http://www.mathaware.org/mam/09/

Next, Julie Lundquist, who just moved from Lawrence Livermore Labs to the University of Colorado, spoke about wind power:

20) Julie Lunquist, Department of Atmospheric and Oceanic Sciences, University of Colorado, http://paos.colorado.edu/people/lundquist.php

With increased reliance on wind, the power grid will need to be redesigned to handle fluctuating power sources. In the US, currently, companies aren't paid for power they generate in excess of the amount they promised to make. So, accurate prediction is a hugely important game. Being off by 1% can cost millions of dollars! Europe has different laws, which encourage firms to maximize the amount of wind power they generate.

If you had your choice about where to build a wind turbine, you'd build it on the ocean or a very flat plain, where the air flows rather smoothly. Hilly terrain leads to annoying turbulence - but sometimes that's your only choice. Then you need to find the best spots, where the turbulence is least bad. Complete simulation of the Navier-Stokes equations is too computationally intensive, so people use fancier tricks. There's a lot of math and physics here.

For weather reports people use "mesoscale simulation" which cleverly treats smaller-scale features in an averaged way - but we need more fine-grained simulations to see how much wind a turbine will get. This is where "large eddy simulation" comes in.

A famous Brookhaven study suggested that the power spectrum of wind has peaks at 4 days, 1/2 day, and 1 minute. This perhaps justifies an approach where different time scales, and thus length scales, are treated separately and the results then combined somehow. The study is actually a bit controversial. But anyway, this is the approach people are taking, and it seems to work.

Night air is stable - but day air is often not, since the ground is hot, and hot air rises. So when a parcel of air moving along hits a hill, it can just shoot upwards, and not come back down! This means lots of turbulence.

Eddy diffusivity is modeled by Monin-Obukhov similarity theory:

21) American Meteorological Society Glossary, Monin-Obukhov similarity theory, http://amsglossary.allenpress.com/glossary/search?id=monin-obukhov-similarity-theory1

The wind turbines at Altamont Pass in California kill more raptors than all other wind farms in the world combined! Old-fashioned wind turbines look like nice places to perch, spelling death to birds. Cracks in concrete attract rodents, which attract raptors, who get killed. The new ones are far better.

For more:

22) National Renewable Energy Laboratory, Research needs for winds resource characterization, available as http://www.nrel.gov/docs/fy08osti/43521.pdf

Finally, there was a talk by Ron Lloyd of Fat Spaniel Technologies. This is a company that makes software for solar plants and other sustainable energy companies:

23) Fat Spaniel Technologies, http://www.fatspaniel.com/products/

His talk was less technical so I didn't take detailed notes. One big point I took away was this: we need better tools for modelling! This is especially true with the coming of the "smart grid". In its simplest form, this is a power grid that uses lots of data - for example, data about power generation and consumption - to regulate itself and increase efficiency. Surely there will be a lot of math here. Maybe even the topic I've been talking about lately: bond graphs!

But now I want to talk about some very simple aspects of electrical circuits. Last week I listed various kinds of circuits. Now let's go into a bit more detail - starting with the simplest kind: circuits made of just wires and linear resistors, where the currents and voltages are independent of time.

Mathematically, such a circuit is a graph equipped with some extra data. First, each edge has a number associated to it - the "resistance". For example:

 o----1----o----3----o | | | | | | 2 3 2 | | | | | | o----3----o----1----o 
Second, we have current flowing through this circuit. To describe this, we first arbitrarily pick an orientation on each edge:
 o---->----o---->----o | | | | | | V V V | | | | | | o----<----o---->----o 
Then we label each edge with a number saying how much "current" is flowing through that edge, in the direction of the arrow:
 2 3 o---->----o---->----o | | | | | | 3V V1 V 3 | | | | | | o----<----o---->----o 2 -3 
Electrical engineers call the current I. Mathematically it's good to think of I as a "1-chain": a linear combination of oriented edges of our graph, with the coefficients of the linear combination being the numbers shown above.

If we know the current, we can work out a number for each vertex of our graph, saying how much current is flowing out of that vertex, minus how much is flowing in:

 2 1 o---->----o---->----o 0 | | | | | | V V V | | | | | | -5 o----<----o---->----o 0 -2 
Mathematically we can think of this as a "0-chain": a formal linear combination of the vertices of our graph, with the numbers shown above as coefficients. We call this 0-chain the "boundary" of the 1-chain we started with. Since our current was called I, we call its boundary δI.

Kirchhoff's current law says that

δI = 0

When this holds, let's say our circuit is a "closed". Physically this follows from the law of conservation of electrical charge, together with a reasonable assumption. Current is the flow of charge. If the total current flowing into a vertex wasn't equal to the amount flowing out, charge - positive or negative - would be building up there. But for a closed circuit, we assume it's not.

If a circuit is not closed, let's call it "open". These are interesting too. For example, we might have a circuit like this:

 x | | V | | o---->----o | | | | V V | | | | x x 
where we have current flowing in the wire on top and flowing out the two wires at bottom. We allow δI to be nonzero at the ends of these wires - the 3 vertices labelled x. This circuit is an "open system" in the sense of "week290", because it has these wires dangling out of it. It's not self-contained; we can use it as part of some bigger circuit. We should really formalize this more, but I won't now. Derek Wise did it more generally here:

24) Derek Wise, Lattice p-form electromagnetism and chain field theory, available as gr-qc/0510033.

The idea here was to get a category where chain complexes are morphisms in a category. In our situation, composing morphisms amounts to gluing the output wires of one circuit into the input wires of another. This is an example of the general philosophy I'm trying to pursue, where open systems are treated as morphisms.

We've talked about 1-chains and 0-chains... but we can also back up and talk about 2-chains! Let's suppose our graph is connected - it is in our example - and let's fill it in with enough 2-dimensional "faces" to get something contractible. We can do this in a god-given way if our graph is drawn on the plane: just fill in all the holes!

 o---------o---------o |/////////|/////////| |/////////|/////////| |//FACE///|///FACE//| |/////////|/////////| |/////////|/////////| o---------o---------o 
In electrical engineering these faces are often called "meshes".

This give us a chain complex

 δ δ C0 <-------- C1 <-------- C2 
and a cochain complex:
 d d C0 --------> C1 ---------> C2 
As I've already said, it's good to think of the current I as a 1-chain, since then

δI = 0

is Kirchoff's current law. Since our little space is contractible the above equation implies that

I = δJ

for some 2-chain J called the "mesh current". This assigns to each face or "mesh" the current flowing around that face.

An electrical circuit also comes with a third piece of data, which I haven't mentioned yet. Each oriented edge should be labelled by a number called the "voltage" across that edge. Electrical engineers call the voltage V. It's good to think of V as a 1-cochain, which assigns to each edge the voltage across that edge.

Why a 1-cochain instead of a 1-chain? Because then

dV = 0

is the other basic law of electrical circuits - Kirchhoff's voltage law! This law says that the sum of these voltages around a mesh is zero. Since our little space is contractible the above equation implies that

V = dφ

for some 0-cochain φ called the "electrostatic potential". In electrostatics, this potential is a function on space. Here it assigns a number to each vertex of our graph.

Since the space of 1-cochains is the dual of the space of 1-chains, we can take the voltage V and the current I, glom them together, and get a number:

V(I)

This the "power": that is, the rate at which our network soaks up energy and dissipates it into heat. Note that this is just a fancy version of formula for power that I explained in "week290" - power is effort times flow.

I've given you three basic pieces of data labelling our circuit: the resistance R, the current I, and the voltage V. But these aren't independent! Ohm's law says that the voltage across any edge is the current through that times the resistance of that edge. But this remember: voltage is a 1-cochain while current is a 1-chain. So "resistance" can be thought of as a map from 1-cochains to 1-chains:

R: C1 → C1

This lets us write Ohm's law like this:

V = RI

This, in turn, means the power of our circuit is

V(I) = (RI)(I)

For physical reasons, this power is always nonnegative. In fact, let's assume it's positive unless I = 0. This is just another way of saying that resistance labelling each edge is positive. It can be very interesting to think about circuits with perfectly conducting wires. These would give edges whose resistance is zero. But that's a bit of an idealization, and right now I'd rather allow only *positive* resistances.

Why? Because then we can think of the above formula as the inner product of I with itself! In other words, then there's a unique inner product on 1-cochains with

(RI)(I) = <I,I>

In this situation

R: C1 → C1

is the usual isomorphism that we get between a finite-dimensional inner product space and its dual. (For this statement to be true, we'd better assume our graph has finitely many vertices and edges.)

Now, if you've studied de Rham cohomlogy, all this should start reminding you of Hodge theory. And indeed, it's a baby version of that! So, we're getting a little bit of Hodge theory, but in a setting where our chain complexes are really morphisms in a category. Or more generally, n-morphisms in an n-category.

There's a lot more to say, but that's enough for now.


So many young people are forced to specialize in one line or another that a young person can't afford to try and cover this waterfront - only an old fogy who can afford to make a fool of himself. If I don't, who will? - John Wheeler


© 2010 John Baez
baez@math.removethis.ucr.andthis.edu


-- Delivered by Feed43 service

n-Category Café This Week's Finds in Mathematical Physics (Week 293)

In week293 of This Week’s Finds, catch up on recent papers and books about n-categories. Hear about last weekend’s Conference on the Mathematics of Environmental Sustainability and Green Technology at Harvey Mudd College. And learn how to think of networks of resistors as chain complexes which are also morphisms in a category.

If you do a math Ph.D. with Kenneth Golden as your advisor, you can do your thesis work here:

February 05, 2010

David Hoggfast data analysis

In the afternoon, I discussed with Itay Yavin and Kyle Cranmer fast methods for fitting exoplanet orbits to stellar radial velocity data using Fourier or periodogram approaches. We were inspired by Bretthorst's book on Bayesian spectral analysis. In the morning, I discussed with Blanton and Demitri Muna (NYU) the detection in real time of supernovae (or other anomalies) in the SDSS-III BOSS spectroscopic data stream.

Steinn SigurðssonAgent Fresco

Newish super trendy band out of Iceland.

Jazzy math music with touch of heavy rock.
Or so I was told.

Read the rest of this post... | Read the comments on this post...

n-Category Café Sheaves Do Not Belong to Algebraic Geometry

…and here’s a proof.

They are, of course, very useful in algebraic geometry (as is the equals sign). Also, human beings discovered them while developing algebraic geometry, which is why many of them still make the association.

But as we’ll see, sheaves are an inevitable consequence of general ideas that have nothing to do with algebraic geometry. In fact, sheaves (and various related notions) arise automatically from two completely general categorical constructions, together with one almost imperceptibly small topological observation.

Before I give you the proof, let me make clear that it isn’t due to me. I don’t know who it is due to — I’ve never seen it in print — but I suspect it was known before I was even born. (Update: see Joachim Kock’s comment for a reference.) People who I’ve told this argument to seem to like it, so I wrote it up in a little note a few years ago; then a recent conversation reminded me of it, so I thought I’d air it here.

First categorical construction  Let A be a small category, E a category with small colimits, and J:AE any functor. Then there is an induced adjunction Set A opHom(J,)JE. The right adjoint Hom(J,) is defined by (Hom(J,E))(A)=Hom(J(A),E) (EE, AA). The left adjoint J is defined by the adjointness, and can be described as a certain coend or colimit.

Example: if J:ΔTop is the standard simplex functor then Hom(J,) is the singular simplicial set functor and J is geometric realization.

Second categorical construction  Any adjunction restricts canonically to an equivalence between full subcategories.

Precisely, let CGFD be an adjunction (F left adjoint to G), with unit η:1GF and counit ε:FG1. Let C¯ be the full subcategory of C consisting of those objects C for which η C:CGF(C) is an isomorphism, and dually D¯. Then the adjunction (F,G,η,ε) restricts to an equivalence between C¯ and D¯.

Almost imperceptibly small topological observation  Any open subset of a topological space can be regarded as a space in its own right, and when one open set is contained in another, there is an induced inclusion of spaces.

Precisely, let S be a topological space. Write O(S) for the poset of open subsets of S, regarded as a category (in which each hom-set has at most one element). Write Top/S for the category of spaces over S: objects are continuous maps into S, and maps are commutative triangles. Then there is a canonical functor J:O(S)Top/S, sending an open set U to the inclusion US.

Punchline  Fix a topological space S. The category Top/S has small colimits, since Top does.

Applying the first categorical construction to the functor J just defined produces an adjunction (presheavesonS)=Set O(S) opTop/S=(spacesoverS). The two functors here are the ones you’d guess.

Applying the second construction now gives an equivalence of categories (sheavesonS)=Sh(S)Et(S)=(étalespacesoverS). This can be interpreted as the definition of sheaf, étale space, etc., or as a theorem, according to taste.

Going right and then left in the adjunction gives the associated sheaf, or sheafification, of a presheaf. Going left and then right gives the ‘étalification’ of a space over S.

Tommaso Dorigo2000 Years Ago Cicero Knew It, Do You ?

"Quidquid oritur, qualecumque est, causam habet a natura. Cum autem res nova et admirabilis fieri videtur, causam invetigato, si poteris, ratione confisus. Si nullam causam reperis, illud tamen certum habeto, nihil fieri potuisse sine causa naturali. Repelle igitur terrorem quem
res nova tibi attulit et semper verbis sapientium confidere aude:
sapiens enim facta, quae prodigiosa videntur , numquam fortuito
evenisse dicet, quod nihil fieri sine causa potest, nec quicquam fit
quod fieri non potest: nulla igitur portenta sunt. Nam si portentum
putare debemus id quod raro fit, sapientem esse portentum est: facilius
esse enim mulam parere arbitror quam sapientem esse."

Marcus Tullius Cicero

Quick and dirty translation:


read more

David Hoggmeta-up

Fengji Hou (my new student, will be Hou from now on in this diary), his co-advisor Jonathan Goodman (NYU Courant), and I discussed Fengji's start on exoplanet radial velocity fitting using advanced sampling tools. We spent a long time talking about code, but once we were done, Goodman and I spent some time talking about medium-term projects that would be non-trivial and interesting. We discussed the idea that if you are a Bayesian (not always advisable), you don't really want to detect planets per se, you want to pass forward probabilistic information about their existence and properties, and then perform your analysis on those probabilistic outputs. In this world, you might be able to discover and say things about classes of planets that are not detected clearly in any individual stellar radial velocity time series. Approaches like this could greatly increase the number of known expolanets for some kinds of statistical studies.

David Hoggunit tests failed

Argh. Lang came into town and we added another JPL-ephemeris-based unit test to our code and it failed. It is a coordinate system problem we weren't able to diagnose before we ran out of day. But we started playing with the Canon Digital Rebels that Sam bought to put on the telescopes we have on the roof of 715 Broadway.

David Hoggstars, pulsars, dark matter

I can't say I did much research today but I saw two beautiful talks, and going to talks does count as research.

At lunch time Dmitry Malyshev (NYU) gave a beautiful talk on millisecond-pulsar and dark-matter contributions to the observed haze at the Galactic Center from Fermi and WMAP. He showed specific pulsar-plus-DM models that explain the spectral properties of the haze beautifully, many of which are natural for both pulsars and the DM. In some, he had to make the electron–positron emission from pulsars very high, but it really is an unknown. He mentioned that 47 Tuc (globular cluster) is a key observable Fermi source for distinguishing these ideas. Malyshev was very cautious and made no strong claims, but my excitement about the possibility that dark matter annihilation is being observed grew during the presentation.

In the afternoon, Nathan Smith (Berkeley) gave an outstanding talk about extremely massive stars as observed in our own Galaxy and nearby galaxies, including their dramatic explosions and mass-loss episodes. These are incredibly rich in their kinematic and chemical properties and have implications for chemical abundance propagation, star formation, supernova prediction, and the evolution of the young universe. He made a comment at the end about extrapolating theories we don't understand into regimes where we have no data which made the astronomers laugh and the particle theorists ask And?.