Another couple of weeks of sciencey blogging at Forbes:
— Football Physics: Deflategate Illustrates Key Concepts: In which I use the everpopular silly scandal over deflated footballs as an excuse to talk about threebody recombination.
— The Annoying Physics Of Air Resistance: Air resistance is an annoyance to be abstracted out in intro physics classes, but looking for its influence with video analysis is kind of fun.
— How NASA’s Viking Mars Probes Helped Prove Einstein Right: We think of missions to Mars as primarily about searching for life, but they have also helped test fundamental physics, specifically via a 1976 experiment to test general relativity.
— Predicting The Nobel Prize In Physics: I continue to suck at guessing who will be awarded a big pile of kroner.
— Football Physics: Why Throw A Spiral?: Like so many other things in physics, it’s really all about angular momentum.
— The Certainty of Uncertainty: Scientists Know Exactly How Well We Don’t Know Things: When physicists and other scientists talk about uncertainty in their results, they’re not admitting ignorance or covering for “human error.” They’re quantifying what’s left after all the “human errors” have been corrected, and expressing confidence in their result.
Far and away the most popular of these was the Nobel prediction post, which is no surprise. The uncertainty one kicked off a long discussion between a bunch of people I don’t know in my Twitter mentions, which was kind of odd. And the Mars thing just totally sank without a trace, which really surprised me. Oh, well, such is blogging.
I recently learned about a curious operation on square matrices known as sweeping, which is used in numerical linear algebra (particularly in applications to statistics), as a useful and more robust variant of the usual Gaussian elimination operations seen in undergraduate linear algebra courses. Given an matrix (with, say, complex entries) and an index , with the entry nonzero, the sweep of at is the matrix given by the formulae
for all . Thus for instance if , and is written in block form as
for some row vector , column vector , and minor , one has
The inverse sweep operation is given by a nearly identical set of formulae:
for all . One can check that these operations invert each other. Actually, each sweep turns out to have order , so that : an inverse sweep performs the same operation as three forward sweeps. Sweeps also preserve the space of symmetric matrices (allowing one to cut down computational run time in that case by a factor of two), and behave well with respect to principal minors; a sweep of a principal minor is a principal minor of a sweep, after adjusting indices appropriately.
Remarkably, the sweep operators all commute with each other: . If and we perform the first sweeps (in any order) to a matrix
with a minor, a matrix, a matrix, and a matrix, one obtains the new matrix
Note the appearance of the Schur complement in the bottom right block. Thus, for instance, one can essentially invert a matrix by performing all sweeps:
If a matrix has the form
for a minor , column vector , row vector , and scalar , then performing the first sweeps gives
and all the components of this matrix are usable for various numerical linear algebra applications in statistics (e.g. in least squares regression). Given that sweeps behave well with inverses, it is perhaps not surprising that sweeps also behave well under determinants: the determinant of can be factored as the product of the entry and the determinant of the matrix formed from by removing the row and column. As a consequence, one can compute the determinant of fairly efficiently (so long as the sweep operations don’t come close to dividing by zero) by sweeping the matrix for in turn, and multiplying together the entry of the matrix just before the sweep for to obtain the determinant.
It turns out that there is a simple geometric explanation for these seemingly magical properties of the sweep operation. Any matrix creates a graph (where we think of as the space of column vectors). This graph is an dimensional subspace of . Conversely, most subspaces of arises as graphs; there are some that fail the vertical line test, but these are a positive codimension set of counterexamples.
We use to denote the standard basis of , with the standard basis for the first factor of and the standard basis for the second vector. The operation of sweeping the entry then corresponds to a ninety degree rotation in the plane, that sends to (and to ), keeping all other basis vectors fixed: thus we have
for generic (more precisely, those with nonvanishing entry ). For instance, if and is of the form (1), then is the set of tuples obeying the equations
The image of under is . Since we can write the above system of equations (for ) as
we see from (2) that is the graph of . Thus the sweep operation is a multidimensional generalisation of the high school geometry fact that the line in the plane becomes after applying a ninety degree rotation.
It is then an instructive exercise to use this geometric interpretation of the sweep operator to recover all the remarkable properties about these operations listed above. It is also useful to compare the geometric interpretation of sweeping as rotation of the graph to that of Gaussian elimination, which instead shears and reflects the graph by various elementary transformations (this is what is going on geometrically when one performs Gaussian elimination on an augmented matrix). Rotations are less distorting than shears, so one can see geometrically why sweeping can produce fewer numerical artefacts than Gaussian elimination.
Filed under: expository, math.NA, math.ST Tagged: sweeping a matrix
As promised, the Screen Junkies episode we made is out. It is about The Martian! JPL's Christina Heinlein (a planetary science expert) also took part, and I hope you find it interesting and thoughtprovoking. Maybe even funny too! As usual, there's a lot that was said that was inevitably left on the (virtual) cuttingroom floor, but a lot of good stuff made the cut. All in all, I'd say that this film (which I enjoyed a lot!) had a refreshing take on science and engineering for a big studio film, on several scores. (Remaining sentences are spoilerfree.) First, rather than hiding the slow machinations involved in problemsolving, it has a lot of it up front! It's an actual celebration of problemsolving, part of the heart and soul of science and engineering. Second, rather than have the standard nerd stereotype [...] Click to continue reading this post
The post Screen Junkies – The Martian, Science, and ProblemSolving! appeared first on Asymptotia.
The day started with Josh Tucker (NYU) talking about the SMaPP lab at NYU, where they are doing observational work in Politics and Economics using data science methods and in a lablike structure. The science is new, but so is the detailed structure of the lab, which is not a standard way of doing Political Science! He pointed out that some PIs in the audience have larger budgets for their individual labs than the entire NSF budget for Political Science! He showed very nice results of turning politicalscience theories into hypotheses about twitter data, and then performing statistical falsifications. Beautiful stuff, and radical. He showed that different players (protesters, opposition, and oppressive regimes) are all using diverse strategies on social media; the simple stories about twitter being democratizing are not (or no longer) correct.
In the afternoon, we returned from Cle Elum to UW, where I discussed problems of inference in exoplanet science with ForemanMackey, Elaine Angelino (UCB), and Eric Agol (UW). After we discussed some likelihoodfree inference (ABC) ideas, Angelino pointed us to the literature on probabilistic programs, which seems highly relevant. In that same conversation, ForemanMackey pointed out the retrospectively obvious point that you can parameterize a positivedefinite matrix using its LU decomposition and then never have to put checks on the eigenvalues. Duh! And awesome.
In the morning, Katy Huff (UCB) gave an energizing talk about The Hacker Within, which is a program to have peers teach peers about their datascience (or programming or engineering) skills to improve everyone's scientific capabilities. The model is completely groundup and selforganized, and she is trying to make it easy for other institutions to “get infected” by the virus. She had some case studies and insights about the conditions under which a selforganized peereducational activity can be born and flourish. UW and NYU are now both going to launch something; I was very much reminded of #AstroHackNY, which is currently dormant.
Karthik Ram (UCB) talked about a really deep project on reproducibility: They have interviewed about a dozen scientists in great detail about their “full stack” workflow, from raw data to scientific results, identifying how reproducibility and openness is or could be introduced and maintained. But the coolest thing is that they are writing up the case studies in a book. This will be a great read; both a comparative look at different disciplines, but also a snapshot of science in 2015 and a gift to people thinking about making their stack open and reproducible.
I had a great conversation with Stefan Karpinski (NYU) and Fernando Perez (UCB) about file formats (of all things). They want to destroy CSV once and for all (or not, if that doesn't turn out to be a good idea). Karpinski explained to me the magic of UTF8 encoding for text. My god is it awesome. Perez asked me to comment on the new STScIsupported ASDF format to replace FITS, and compare to HDF5. I am torn. I think ASDF might be slightly better suited to astronomers than HDF5, but HDF5 is a standard for a very wide community, who maintain and support it. This might be a case of the better is the enemy of the good (a phrase I learned from my mentor Gerry Neugebauer, who died this year). Must do more analysis and thinking.
In the afternoon, in the unconference, I participated in a discussion of imaging and image processing as a crosscutting datascience methodology and toolkit. Lei Tian (UCB) described forwardmodeling for superresolution microscopy, and mentioned a whole bunch of astronomylike issues, such as spatially variable pointspread function, image priors, and the likelihood function. It is very clear that we have to get the microscopists and astronomers into the same room for a couple days; I am certain we have things to learn from one another. If you are reading this and would be interested, drop me a line.
Today was the first day of the MooreSloan Data Science Environments annual summit, held this year in Cle Elum, Washington. We had talks about activities going on at UW; many of the most interesting to me were around reproducibility and open science. For example, there were discussions of reproducibility badges, where projects can be rated on a range of criteria and given a score. The idea is to make reproducibility a competitive challenge among researchers. A theme of this is that it isn't cheap to run fully reproducible. That said, there are also huge advantages, not just to science, but also to the individual, as I have commented in this space before. It is easy to forget that when CampHogg first went fully open, we did so because it made it easier for us to find our own code. That sounds stupid, but it's really true that it is much easier to find your threeyearold code on the web than on your legacy computer.
Ethics came up multiple times at the meeting. Ethical training and a foregrounding of ethical issues in data science is a shared goal in this group. I wonder, however, if we got really specific and technical, whether we would agree what it means to be ethical with data. Sometimes the most informative and efficient datascience methods to (say) improve the fairness in distribution of services could easily conflict with concerns about privacy, for example. That said, this is all the more reason that we should encourage ethical discussions in the data science community, and also encourage those discussions to be specific and technical.
When signing up for physics grad school, I didn’t expect to be interviewed by a comedienne on a spoof science show about women in STEM.
Last May, I received an email entitled “Amy Poehler’s Smart Girls.” The actress, I read, had cofounded the Smart Girls organization to promote confidence and creativity in preteens and teens. Smart Girls was creating a webseries hosted by Megan Amram, author of Science…for Her! The book parodies women’s magazines and ridicules stereotypes of women as unsuited for science.
Megan would host the webseries, “Experimenting with Megan,” in character as an airhead. She planned to interview “kickass lady scientists/professors/doctors” in a parody of a talk show. Would I, the email asked, participate?
I’m such a straitlaced fogey, I never say “kickass.” I’m such a workaholic, I don’t watch webshows. I’ve not seen Parks and Recreation, the TV series that starred Amy Poehler and for which Megan wrote. The Hollywood bug hasn’t bitten me, though I live 30 minutes from Studio City.
But I found myself in a studio the next month. Young men and women typed on laptops and chattered in the airy, bright waiting lounge. Beyond a doorway lay the set, enclosed by fabriccovered walls that prevented sounds from echoing. Scriptfilled binders passed from hand to hand, while makeup artists, cameramen, and gophers scurried about.
Disney’s Mouseketeers couldn’t have exuded more enthusiasm or friendliness than the “Experimenting” team. “Can I bring you a bottle of water?” team members kept asking me and each other. “Would you like a chair?” The other women who interviewed that day—two biologist postdocs—welcomed me into their powwow. Each of us, we learned, is outnumbered by men at work. None of us wears a lab coat, despite stereotypes of scientists as whitecoated. Each pours herself into her work: One postdoc was editing a grant proposal while offset.
I watched one interview, in which Megan asked why biologists study fruit flies instead of “cuter” test subjects. Then I stepped onset beside her. I perched on an armchair that threatened to swallow my 5’ 3.5” self.* Textbooks, chemistry flasks, and highheeled pumps stood on the bookshelves behind Megan.
The room quieted. A clapperboard clapped: “Take one.” Megan thanked me for coming, then launched into questions.
Megan hadn’t warned me what she’d ask. We began with “Do you like me?” and “What is the ‘information’ [in ‘quantum information theory’], and do you ever say, ‘Too much information’?” Each question rode hot on the heels of the last. The barrage reminded me of interviews for notnecessarilyscientific scholarships. Advice offered by one scholarshipcommittee member, the year before I came to Caltech, came to mind: Let loose. Act like an athlete tearing down the field, the opposing team’s colors at the edges of your vision. Savor the challenge.
I savored it. I’d received instructions to play the straight man, answering Megan’s absurdity with science. To “Too much information?” I parried that we can never know enough. When I mentioned that quantum mechanics describes electrons, Megan asked about the electricity she feels upon seeing Chris Hemsworth. (I hadn’t heard of Chris Hemsworth. After watching the interview online, a friend reported that she’d enjoyed the reference to Thor. “What reference to Thor?” I asked. Hemsworth, she informed me, plays the title character.) I dodged Chris Hemsworth; caught “electricity”; and stretched to superconductors, quantum devices whose charges can flow forever.
Academic seminars conclude with questionandanswer sessions. If only those Q&As zinged with as much freshness and flexibility as Megan’s.
The “Experimenting” approach to stereotypeblasting diverges from mine. Highheeled pumps, I mentioned, decorated the set. The “Experimenting” team was parodying the stereotype of women as shoecrazed. “Look at this stereotype!” the set shouts. “Isn’t it ridiculous?”
As a woman who detests high heels and shoe shopping, I prefer to starve the stereotype of justification. I’ve preferred reading to shopping since before middle school, when classmates began frequenting malls. I feel more comfortable demonstrating, through silence, how little shoes interest me. I’d rather offer no reason for anyone to associate me with shoes.**
I scarcely believe that I appear just after a “sexy science” tagline and a hotornot quiz. Before my interview on her quantum episode, Megan discussed the relationship between atoms and Adams. Three guests helped her, three Hollywood personalities named “Adam.”*** Megan held up cartoons of atoms, and photos of Adams, and asked her guests to rate their hotness. I couldn’t have played Megan’s role, couldn’t imagine myself in her (highheeled) shoes.
But I respect the “Experimenting” style. Megan’s character serves as a foil for the interviewee I watched. Megan’s ridiculousness underscored the postdoc’s professionalism and expertise.
According to online enthusiasm, “Experimenting” humor resonates with many viewers. So diverse is the community that needs introducing to STEM, diverse senses of humor have roles to play. So deep run STEM’s social challenges, multiple angles need attacking.
Just as diverse perspectives can benefit womeninSTEM efforts, so can diverse perspectives benefit STEM. Which is why STEM needs women, Adams, shoelovers, shoehaters…and experimentation.
With gratitude to the “Experimenting” team for the opportunity to contribute to its cause. The liveaction interview appears here (beginning at 2:42), and a followup personality quiz appears here.
*If you’re 5′ 3.5″, every halfinch matters.
**Except when I blog about how little I wish to associate with shoes.
***Megan introduced her guests as “Adam Shankman, Adam Pally, and an intern that we made legally change his name to Adam to be on the show.” The “intern” is Adam Rymer, president of Legendary Digital Networks. Legendary owns Amy Poehler’s Smart Girls.
In the morning, tutorials were again from Brewer and Marshall, this time on MCMC sampling. Baron got our wrapup presentation document ready while I started writing a title, abstract, and outline for a possible first paper on what we are doing. This has been an amazingly productive week for me; Baron and I both learned a huge amount but also accomplished a huge amount. Now, to sustain it.
The hackweek wrapup session blew my mind. Rather than summarize what I liked most, I will just point you to the hackpad where there are short descriptions and links out to notebooks and figures. Congratulations to everyone who participated, and to Huppenkothen, who did everything, and absolutely rocked it.
The day started with tutorials on Bayesian reasoning and inference by Brewer and Marshall. Brewer started with a very elementary example, which led to lots of productive and valuable discussion, and tons of learning! Marshall followed with an example in a Jupyter notebook, built so that every person's copy of the notebook was solving a slightly different problem!
Baron and I got the full optimization of our galaxy deprojection project working! We can optimize both for the projections (Euler angles and shifts etc) of the galaxies into the images and for the parameters of the mixture of Gaussians that makes up the threedimensional model of the galaxy. It worked on some toy examples. This is incredibly encouraging, since although our example is a toy, we also haven't tried anything nontrivial for the optimizer. I am very excited; it is time to set the scope of what would be the first paper on this.
In the evening, there was a debate about model comparison, with me on one side, and the tag team of Brewer and Marshall on the other. They argued for the evidence integral, and I argued against, in all cases except model mixing. Faithful readers of this blog know my reasons, but they include, first, an appeal to utility (because decisions require utilities) and then, second, a host of practical issues about computing expected utility (and evidence) integrals, that make precise comparisons impossible. And for imprecise comparisons, heuristics work great. Unfortunately it isn't a simple argument; it is detailed engineering argument about real decisionmaking.
In yesterday’s crude astrophotography post, I mentioned that the conjunction of Venus and the Moon would be closer this morning, but I took a shot of it yesterday because there’s no telling with the weather here this time of year. When I first went outside this morning, though, the sky was beautiful and clear and I said “Wow, that’ll be a great shot after all.”
Then I went inside to eat breakfast, and when I came back out, clouds had rolled in.
There were occasional breaks, though, and I did manage to get a fortuitous alignment of two of those with the bodies in question. which let me make the following composite from consecutive days:
These are cropped so the vertical scale is the same for both, and Venus is on the 1/3rd line up from the bottom. Today’s shot wasn’t quite framed the same as yesterday’s, though, thus the white space on the right, which would’ve been filled if I could get a full 4×6 crop at that scale.
Here’s a closer crop of today’s shot with the usual 4×6 aspect ratio:
That composite of the two gives a nice sense of how things move in the sky, and I have other stuff I need to work on today, so we’ll call it the photo of the day, and move along.
The baryon decuplet doesn’t come easy to us, but the beauty of symmetry does, and how amazing that physicists have found it tightly woven into the fabric of nature itself: Both the standard model of particle physics and General Relativity, our currently most fundamental theories, are in essence mathematically precise implementations of symmetry requirements. But next to being instrumental for the accurate description of nature, the appeal of symmetries is a human universal that resonates in art and design throughout cultures. For the physicist, it is impossible not to note the link, not to see the equations behind the art. It may be a curse or it may be a blessing.
Lara’s decuplet 
My husband’s decuplet. 
One of the few problems with the new camera is that the Canon software that talks to it only runs on my laptop, not my home desktop. This is an issue partly because the laptop has less disk space (I got it with the biggest SSD available a few years ago, which is small compared to the spinnydisk drives in the desktop), but a bigger issue for image processing is the display, which shifts colors dramatically as you change the viewing angle. This isn’t a big issue for daylight photos of the kids, where the camera does pretty well already and the GIMP autolevel tool handles the few color issues that creep in. In low light, though, it’s next to impossible to figure out what the image really looks like.
And today’s photo of the day is a very lowlight image:
That’s the crescent Moon and Venus just above some of the big trees behind our yard. These two have been creeping closer together over the last few days, something I’ve noticed because my morning walk with Emmy is now starting before sunrise. They’ll probably be even closer tomorrow, and the Moon smaller, but the weather around here at this time of year is not terribly reliable, and I figured I should take this shot while I could. If I get a better version tomorrow, well, I’ll see about cropping it in a way that illustrates the relative motion nicely.
This was taken handheld, with the f/1.8 50mm lens, and the exposure time cranked down pretty short; the automatic routine on the camera wanted it to be long enough that everything got blurred. I could probably get a better image with the tripod, but I was being lazy and the dog was waiting for her walk.
The sky in the original image appears quite a bit darker than it did live (because our eyes are rather nonlinear in their response to light), and tweaking it in GIMP was a huge pain on the laptop. This is still on the toodark side (looking at it on the desktop monitor), but it looks nice, so I decided to quite while I was ahead.
This post is to announce the start of a new mathematics journal, to be called Discrete Analysis. While in most respects it will be just like any other journal, it will be unusual in one important way: it will be purely an arXiv overlay journal. That is, rather than publishing, or even electronically hosting, papers, it will consist of a list of links to arXiv preprints. Other than that, the journal will be entirely conventional: authors will submit links to arXiv preprints, and then the editors of the journal will find referees, using their quick opinions and more detailed reports in the usual way in order to decide which papers will be accepted.
Part of the motivation for starting the journal is, of course, to challenge existing models of academic publishing and to contribute in a small way to creating an alternative and much cheaper system. However, I hope that in due course people will get used to this publication model, at which point the fact that Discrete Analysis is an arXiv overlay journal will no longer seem interesting or novel, and the main interest in the journal will be the mathematics it contains.
The members of the editorial board so far — but we may well add further people in the near future — are Ernie Croot, me, Ben Green, Gil Kalai, Nets Katz, Bryna Kra, Izabella Laba, Tom Sanders, Jozsef Solymosi, Terence Tao, Julia Wolf, and Tamar Ziegler. For the time being, I will be the managing editor. I interpret this as meaning that I will have the ultimate responsibility for the smooth running of the journal, and will have to do a bit more work than the other editors, but that decisions about journal policy and about accepting or rejecting papers will be made democratically by the whole editorial board. (For example, we had quite a lot of discussion, including a vote, about the title, and the other editors have approved this blog post after suggesting a couple of minor changes.)
I will write the rest of this post as a series of questions and answers.
The members of the editorial board all have an interest in additive combinatorics, but they also have other interests that may be only loosely related to additive combinatorics. So the scope of the journal is best thought of as a cluster of related subjects that cannot easily be pinned down with a concise definition, but that can be fairly easily recognised. (Wittgenstein refers to this kind of situation as a family resemblance.) Some of the subjects we will welcome in the journal are harmonic analysis, ergodic theory, topological dynamics, growth in groups, analytic number theory, combinatorial number theory, extremal combinatorics, probabilistic combinatorics, combinatorial geometry, convexity, metric geometry, and the more mathematical side of theoretical computer science. The phrase “discrete analysis” was coined by Ben Green when he wanted a suitable name for a seminar in Cambridge: despite its oxymoronic feel, it is in fact a good description of many parts of mathematics where the structures being studied are discrete, but the tools are analytical in character. (A particularly good example is the use of discrete Fourier analysis to solve combinatorial problems in number theory.)
We do not want the journal to be a fully general mathematical journal, but we do want it to be broad. If you are in doubt about whether the subject matter of your paper is suitable, then feel free to consult an editor. We will try to err on the side of inclusiveness.
No. This journal is what some people call a diamond open access journal: there are no charges for readers (obviously, since the papers are on the arXiv), and no charges for authors.
The software for managing the refereeing process will be provided by Scholastica, an outfit that was set up a few years ago by some graduates from the University of Chicago with the aim of making it very easy to create electronic journals. However, the look and feel of Discrete Analysis will be independent: the people at Scholastica are extremely helpful, and one of the services they provide is a web page designed to the specifications you want, with a URL that does not contain the word “scholastica”. Scholastica does charge for this service — a whopping $10 per submission. (This should be compared with typical article processing charges of well over 100 times this from more conventional journals.) Cambridge University has kindly agreed to provide a small grant to the journal, which means that we will be able to cover the cost of the first 500 or so submissions. I am confident that by the time we have had that many submissions, we will be able to find additional funding. The absolute worst that could happen is that in a few years’ time, we will have to ask people to pay an amount roughly equal to the cost of a couple of beers to submit a paper, but it is unlikely that we will ever have to charge anything.
Whatever happens, this journal will demonstrate the following important principle: if you trust authors to do their own typesetting and copyediting to a satisfactory standard, with the help of suggestions from referees, then the cost of running a mathematics journal can be at least two orders of magnitude lower than the cost incurred by traditional publishers. In theory, this offers a way out of the current stranglehold that the publishers have over us: if enough universities set up enough journals at these very modest costs, then we will have an alternative and much cheaper publication system up and running, and it will look more and more pointless to submit papers to the expensive journals, which will save the universities huge amounts of money. Just to drive the point home, the cost of submitting an article from the UK to the Journal of the London Mathematical Society is, if you want to use their openaccess option, £2,310. If Discrete Analysis gets 50 submissions per year (which is more than I would expect to start with), then this single article processing charge would cover our costs for well over five years.
Furthermore, even these modest costs could have been lower. We happened to have funds that allowed us to use Scholastica’s facilities, and decided to do that, but another possibility would have been the Episciences platform, which has been specifically designed for the setting up of overlay journals, and which does not charge anything. It is still in its very early stages, but it already has two mathematics journals (which existed before and migrated to the Episciences platform), and it would be very good to see more. Another possibility that some people might find it worth considering is Open Journal Systems, though that requires a degree of technical skill that I for one do not possess, whereas setting up a journal with Scholastica has been extremely easy, and I think using the Episciences platform would be easy as well.
Could a malevolent person — let us call him or her the Evil Seer — bankrupt the journal by submitting 1000 computergenerated papers? Is it reasonable for us to be charged $10 for instantly rejecting a twopage proof of the Riemann hypothesis that uses nothing more than highschool algebra? I have taken this up with Scholastica, and they have told me that in such cases we just need to tell them and will not be charged.
Yes. As already mentioned, the articles will be peerreviewed in the traditional way. There will also be a numbering system for the articles, so that when they are cited, they look like journal articles rather than “mere” arXiv preprints. They will be exclusive to Discrete Analysis. They will have DOIs, and the journal will have an ISSN. Whether the journal will at some point have an impact factor I do not know, but I hope that most people who consider submitting to it will in any case have a healthy contempt for impact factors. We will adhere to the “best practice” as set out in MathSciNet’s Policy on Indexing Electronic Journals, so our articles should be listed there and on Zentralblatt — we are in the process of checking whether this will definitely happen.
No. Another example is SIGMA (Symmetry, Integrability and Geometry: Methods and Applications), though as well as giving arXiv links it hosts its own copies of its articles. And another, which is a mathematically oriented computer science journal, is Logical Methods in Computer Science. I would guess that there are several others that I am unaware of. But one can at least say that Discrete Analysis is an early adopter of the arXiv overlay model.
The current plan is that people are free to submit articles immediately, via a temporary website that has been set up for the purpose. We hope that we will be able to process a few good papers quickly, which will allow us to have an official launch of the journal in early 2016 with some articles already published.
It is difficult to be precise about this, especially before we have received any submissions. However, broadly speaking, we would like to publish genuinely interesting papers in the areas described above. So if you have proved a result that you think is likely to interest the editors, then please consider Discrete Analysis for it. We would like the journal to be consistently interesting, but we do not want to set the standard so high that we do not publish anything.
It would be a pity to exclude the editors from the journal, given that their areas of research are by definition suitable for it. Our policy will be to allow editors to be authors, but to apply slightly more rigorous standards to submissions from editors. In practice, that will mean that in borderline cases a paper will be at a disadvantage if one of its authors is an editor. It goes without saying that editors will be completely excluded from the discussion of any paper that might lead to a conflict of interest. Scholastica’s software makes it very easy to do this.
We have not (yet) discussed the question of whether I as managing editor should be allowed to submit to the journal, but I shall probably follow the policy of many reputable journals and avoid doing so (albeit with some regret) and send any papers that would have been suitable to other journals with publication models that I want to support.
An obvious partial answer to this question is that the list of links on our journal website will be a list of certificates that certain arXiv preprints have been peer reviewed and judged to be of a suitable standard for Discrete Analysis. Thus, it will provide information that the arXiv alone does not provide.
However, we intend to do slightly more than this. For each paper, we will give not just a link, but also a short description. This will be based on the abstract and introduction, and on any further context that one of our editors or referees may be able to give us. The advantage of this is that it will be possible to browse the journal and get a good idea of what it contains, without having to keep clicking back and forth to arXiv preprints. In this way, we hope to make visiting the Discrete Analysis home page a worthwhile experience.
Another thing we will be able to do with these descriptions is post links to newer versions of the articles. If an author wishes to update an article after it has been published, we will provide two links: one to the “official” version (that is, not the first submitted version, but the “final” version that takes into account comments by the referee), and one to the new updated version, with a brief summary of what has changed.
The mathematical community is now sufficiently dependent on the arXiv that it is very unlikely that the arXiv will fold, and if it does then there will be greater problems than the fate of Discrete Analysis. However, in this hypothetical situation, we will download all the articles accepted by Discrete Analysis, as well as those still under review, and find another way of hosting them. Note that articles posted to the arXiv are automatically uploaded to HAL as well, so one possibility would be simply to change the arXiv links to HAL links. As for Scholastica, they perform regular backups of all their data, so even if their main site were to be wiped out, all the information concerning their journals would be recoverable. In short, barring a catastrophic failure of the entire internet, articles published in Discrete Analysis will be secure and permanent.
The editors have widely differing views about these sorts of ideas. For now, we are taking a cautious approach, trying to make the journal as conventional as possible so as to maximize its chances of becoming successful. If at some point in the future we decide to experiment with newer methods of peer review, we shall continue to be cautious, and will always give authors the chance to opt out of them.
First, post it on the arXiv, selecting one of the CCBY options when it asks you which licence you want to use (this is important for ensuring that the journal complies with the openaccess requirements of various funding bodies, but if you have already posted the article under a more restrictive licence, you can always use a CCBY licence for the version that is revised in the light of comments from referees). Then go to the journal’s temporary website, click on the red “Submit Manuscript” button in the top righthand corner, and follow the simple instructions.
Not everybody reads blogs, so one way that you can support the journal is to bring it to the attention of anybody you know who might conceivably have a suitable paper for it. The sooner we can build up an initial list of interesting papers, the sooner the journal can become established, and the sooner the cheap arXiv overlay model can start competing with the expensive traditional models of publication.
The difficulttospell name “Schenectady” (where Union is located) derives from a Mohawk word meaning “beyond the pines.” The pines in question are an extensive region of pine barrens between Albany and Schenectady, a small bit of which survives as the Albany Pine Bush Nature Preserve. They’ve got a nice little nature center and some trails through the pine bush, and I took The Pip down there this morning, because his day care was closed again for the last of the fall block of Jewish holidays. Here’s a shot to give you an idea of the landscape:
Unfortunately, this was as far as we got into the preserve, as the Little Dude wanted nothing to do with any kind of outdoor activity. He rampaged around the (rather small) visitor center for a while, and after lunch was very happy to run in circles in one of the local shopping malls, but he just did not want to be outside, despite the fact that it was a beautiful day today.
As someone who grew up in a rural area, I find this very frustrating, but, you know, there are worse forms of rebellion. I’m going to have to go back there by myself one of these days, to get a real look around.
Tomorrow, we’re done with Jewish holidays, and back to the regular day care routine. And while I did have a lot of fun hanging out with The Pip, I’d be lying if I said it wasn’t a relief to be handing him off to somebody else to entertain during the day…
Yes, I've been hanging out with my Screen Junkies friends again, and this time I also got to meet JPL's Christina Heinlein, who you may recall was in the first of the Screen Junkies "Movie Science" episodes last year. While we were both in it, I'd not got to meet her that time since our chats with host Hal Rudnick were recorded at quite different times. This time, however, schedules meant [...] Click to continue reading this post
The post Get ready for some “movie science” chatter… appeared first on Asymptotia.
Neutrinos!
See the press release here, and congratulations to the winners!
(Honestly, I thought that the Nobel prize for this had already been given...)
cvj Click to continue reading this post
The post Neutrinos! appeared first on Asymptotia.
So, the Nobel Prize in Physics 2015 has been announced. To much surprise of many (including the author), it was awarded jointly to Takaaki Kajita and Arthur B. McDonald “for the discovery of neutrino oscillations, which shows that neutrinos have mass.” Well deserved Nobel Prize for a fantastic discovery.
What is this Nobel prize all about? Some years ago (circa 1997) there were a couple of “deficit” problems in physics. First, it appeared that the detected number of (electron) neutrinos coming form the Sun was measured to be less than expected. This could be explained in a number of ways. First, neutrino could oscillate — that is, neutrinos produced as electron neutrinos in nuclear reactions in the Sun could turn into muon or tau neutrinos and thus not be detected by existing experiments, which were sensitive to electron neutrinos. This was the most exciting possibility that ultimately turned out to be correct! But it was by far not the only one! For example, one could say that the Standard Solar Model (SSM) predicted the fluxes wrong — after all, the flux of solar neutrinos is proportional to core temperature to a very high power (~T^{25} for ^{8}B neutrinos, for example). So it is reasonable to say that neutrino flux is not so well known because the temperature is not well measured (this might be disputed by solar physicists). Or something more exotic could happen — like the fact that neutrinos could have large magnetic moment and thus change its helicity while propagating in the Sun to turn into a righthanded neutrino that is sterile.
The solution to this is rather ingenious — measure neutrino flux in two ways — sensitive to neutrino flavor (using “charged current (CC) interactions”) and insensitive to neutrino flavor (using “neutral current (NC) interactions”)! Choosing heavy water — which contains deuterium — is then ideal for this detection. This is exactly what SNO collaboration, led by A. McDonald did
As it turned out, the NC flux was exactly what SSM predicted, while the CC flux was smaller. Hence the conclusion that electron neutrinos would oscillate into other types of neutrinos!
Another “deficit problem” was associated with the ratio of “atmospheric” muon and electron neutrinos. Cosmic rays hit Earth’s atmosphere and create pions that subsequently decay into muons and muon neutrinos. Muons would also eventually decay, mainly into an electron, muon (anti)neutrino and an electron neutrino, as
As can be seen from the above figure, one would expect to have 2 muonflavored neutrinos per one electronflavored one.
This is not what Super K experiment (T. Kajita) saw — the ratio really changed with angle — that is, the ratio of neutrino fluxes from above would differ substantially from the ratio from below (this would describe neutrinos that went through the Earth and then got into the detector). The solution was again neutrino oscillations – this time, muon neutrinos oscillated into the tau ones.
The presence of neutrino oscillations imply that they have (tiny) masses — something that is not predicted by minimal Standard Model. So one can say that this is the first indication of physics beyond the Standard Model. And this is very exciting.
I think it is interesting to note that this Nobel prize might help the situation with funding of US particle physics research (if anything can help…). It shows that physics has not ended with the discovery of the Higgs boson — and Fermilab might be on the right track to uncover other secrets of the Universe.
In Notes 0, we introduced the notion of a measure space , which includes as a special case the notion of a probability space. By selecting one such probability space as a sample space, one obtains a model for random events and random variables, with random events being modeled by measurable sets in , and random variables taking values in a measurable space being modeled by measurable functions . We then defined some basic operations on these random events and variables:
These operations obey various axioms; for instance, the boolean operations on events obey the axioms of a Boolean algebra, and the probabilility function obeys the Kolmogorov axioms. However, we will not focus on the axiomatic approach to probability theory here, instead basing the foundations of probability theory on the sample space models as discussed in Notes 0. (But see this previous post for a treatment of one such axiomatic approach.)
It turns out that almost all of the other operations on random events and variables we need can be constructed in terms of the above basic operations. In particular, this allows one to safely extend the sample space in probability theory whenever needed, provided one uses an extension that respects the above basic operations. We gave a simple example of such an extension in the previous notes, but now we give a more formal definition:
Definition 1 Suppose that we are using a probability space as the model for a collection of events and random variables. An extension of this probability space is a probability space , together with a measurable map (sometimes called the factor map) which is probabilitypreserving in the sense that
for all . (Caution: this does not imply that for all – why not?)
An event which is modeled by a measurable subset in the sample space , will be modeled by the measurable set in the extended sample space . Similarly, a random variable taking values in some range that is modeled by a measurable function in , will be modeled instead by the measurable function in . We also allow the extension to model additional events and random variables that were not modeled by the original sample space (indeed, this is one of the main reasons why we perform extensions in probability in the first place).
Thus, for instance, the sample space in Example 3 of the previous post is an extension of the sample space in that example, with the factor map given by the first coordinate projection . One can verify that all of the basic operations on events and random variables listed above are unaffected by the above extension (with one caveat, see remark below). For instance, the conjunction of two events can be defined via the original model by the formula
or via the extension via the formula
The two definitions are consistent with each other, thanks to the obvious settheoretic identity
Similarly, the assumption (1) is precisely what is needed to ensure that the probability of an event remains unchanged when one replaces a sample space model with an extension. We leave the verification of preservation of the other basic operations described above under extension as exercises to the reader.
Remark 2 There is one minor exception to this general rule if we do not impose the additional requirement that the factor map is surjective. Namely, for nonsurjective , it can become possible that two events are unequal in the original sample space model, but become equal in the extension (and similarly for random variables), although the converse never happens (events that are equal in the original sample space always remain equal in the extension). For instance, let be the discrete probability space with and , and let be the discrete probability space with , and nonsurjective factor map defined by . Then the event modeled by in is distinct from the empty event when viewed in , but becomes equal to that event when viewed in . Thus we see that extending the sample space by a nonsurjective factor map can identify previously distinct events together (though of course, being probability preserving, this can only happen if those two events were already almost surely equal anyway). This turns out to be fairly harmless though; while it is nice to know if two given events are equal, or if they differ by a nonnull event, it is almost never useful to know that two events are unequal if they are already almost surely equal. Alternatively, one can add the additional requirement of surjectivity in the definition of an extension, which is also a fairly harmless constraint to impose (this is what I chose to do in this previous set of notes).
Roughly speaking, one can define probability theory as the study of those properties of random events and random variables that are modelindependent in the sense that they are preserved by extensions. For instance, the cardinality of the model of an event is not a concept within the scope of probability theory, as it is not preserved by extensions: continuing Example 3 from Notes 0, the event that a die roll is even is modeled by a set of cardinality in the original sample space model , but by a set of cardinality in the extension. Thus it does not make sense in the context of probability theory to refer to the “cardinality of an event “.
On the other hand, the supremum of a collection of random variables in the extended real line is a valid probabilistic concept. This can be seen by manually verifying that this operation is preserved under extension of the sample space, but one can also see this by defining the supremum in terms of existing basic operations. Indeed, note from Exercise 24 of Notes 0 that a random variable in the extended real line is completely specified by the threshold events for ; in particular, two such random variables are equal if and only if the events and are surely equal for all . From the identity
we thus see that one can completely specify in terms of using only the basic operations provided in the above list (and in particular using the countable conjunction .) Of course, the same considerations hold if one replaces supremum, by infimum, limit superior, limit inferior, or (if it exists) the limit.
In this set of notes, we will define some further important operations on scalar random variables, in particular the expectation of these variables. In the sample space models, expectation corresponds to the notion of integration on a measure space. As we will need to use both expectation and integration in this course, we will thus begin by quickly reviewing the basics of integration on a measure space, although we will then translate the key results of this theory into probabilistic language.
As the finer details of the Lebesgue integral construction are not the core focus of this probability course, some of the details of this construction will be left to exercises. See also Chapter 1 of Durrett, or these previous blog notes, for a more detailed treatment.
— 1. Integration on measure spaces —
Let be a measure space, and let be a measurable function on , taking values either in the reals , the nonnegative extended reals , the extended reals , or the complex numbers . We would like to define the integral
of on . (One could make the integration variable explicit, e.g. by writing , but we will usually not do so here.) When integrating a reasonably nice function (e.g. a continuous function) on a reasonably nice domain (e.g. a box in ), the Riemann integral that one learns about in undergraduate calculus classes suffices for this task; however, for the purposes of probability theory, we need the much more general notion of a Lebesgue integral in order to properly define (2) for the spaces and functions we will need to study.
Not every measurable function can be integrated by the Lebesgue integral. There are two key classes of functions for which the integral exists and is well behaved:
One could in principle extend the Lebesgue integral to slightly more general classes of functions, e.g. to sums of absolutely integrable functions and unsigned functions. However, the above two classes already suffice for most applications (and as a general rule of thumb, it is dangerous to apply the Lebesgue integral to functions that are not unsigned or absolutely integrable, unless you really know what you are doing).
We will construct the Lebesgue integral in the following four stages. First, we will define the Lebesgue integral just for unsigned simple functions – unsigned measurable functions that take on only finitely many values. Then, by a limiting procedure, we extend the Lebesgue integral to unsigned functions. After that, by decomposing a real absolutely integrable function into unsigned components, we extend the integral to real absolutely integrable functions. Finally, by taking real and imaginary parts, we extend to complex absolutely integrable functions. (This is not the only order in which one could perform this construction; for instance, in Durrett, one first constructs integration of bounded functions on finite measure support before passing to arbitrary unsigned functions.)
First consider an unsigned simple function , thus is measurable and only takes values at a finite number of values. Then we can express as a finite linear combination (in ) of indicator functions. Indeed, if we enumerate the values that takes as (avoiding repetitions) and setting for , then it is clear that
(It should be noted at this point that the operations of addition and multiplication on are defined by setting for all , and for all positive , but that is defined to equal . To put it another way, multiplication is defined to be continuous from below, rather than from above: . One can verify that the commutative, associative, and distributive laws continue to hold on , but we caution that the cancellation laws do not hold when is involved.)
Conversely, given any coefficients (not necessarily distinct) and measurable sets in (not necessarily disjoint), the sum is an unsigned simple function.
A single simple function can be decomposed in multiple ways as a linear combination of unsigned simple functions. For instance, on the real line , the function can also be written as or as . However, there is an invariant of all these decompositions:
Exercise 3 Suppose that an unsigned simple function has two representations as the linear combination of indicator functions:
where are nonnegative integers, lie in , and are measurable sets. Show that
(Hint: first handle the special case where the are all disjoint and nonempty, and each of the is expressible as the union of some subcollection of the . Then handle the general case by considering the atoms of the finite boolean algebra generated by and .)
We capture this invariant by introducing the simple integral of an unsigned simple function by the formula
whenever admits a decomposition . The above exercise is then precisely the assertion that the simple integral is welldefined as an element of .
Exercise 4 Let be unsigned simple functions, and let .
 (i) (Linearity) Show that
and
 (ii) Show that if and are equal almost everywhere, then
 (iii) Show that , with equality if and only if is zero almost everywhere.
 (iv) (Monotonicity) If almost everywhere, show that .
 (v) (Markov inequality) Show that for any .
Now we extend from unsigned simple functions to more general unsigned functions. If is an unsigned measurable function, we define the unsigned integral as
where the supremum is over all unsigned simple functions such that for all .
Many of the properties of the simple integral carry over to the unsigned integral easily:
Exercise 5 Let be unsigned functions, and let .
 (i) (Superadditivity) Show that
and
 (ii) Show that if and are equal almost everywhere, then
 (iii) Show that , with equality if and only if is zero almost everywhere.
 (iv) (Monotonicity) If almost everywhere, show that .
 (v) (Markov inequality) Show that for any . In particular, if , then is finite almost everywhere.
 (vi) (Compatibility with simple integral) If is simple, show that .
 (vii) (Compatibility with measure) For any measurable set , show that .
Exercise 6 If is a discrete probability space (with the associated probability measure ), and is a function, show that
(Note that the condition in the definition of a discrete probability space is not required to prove this identity.)
The observant reader will notice that the linearity property of simple functions has been weakened to superadditivity. This can be traced back to a breakdown of symmetry in the definition (3); the unsigned simple integral of is defined via approximation from below, but not from above. Indeed the opposite claim
can fail. For a counterexample, take to be the discrete probability space with probabilities , and let be the function . By Exercise 6 we have . On the other hand, any simple function with must equal on a set of positive measure (why?) and so the righthand side of (4) can be infinite. However, one can get around this difficulty under some further assumptions on , and thus recover full linearity for the unsigned integral:
Exercise 7 (Linearity of the unsigned integral) Let be a measure space.
 (i) Let be an unsigned measurable function which is both bounded (i.e., there is a finite such that for all ) and has finite measure support (i.e., there is a measurable set with such that for all ). Show that (4) holds for this function .
 (ii) Establish the additivity property
whenever are unsigned measurable functions that are bounded with finite measure support.
 (iii) Show that
as whenever is unsigned measurable.
 (iv) Using (iii), extend (ii) to the case where are unsigned measurable functions with finite measure support, but are not necessarily bounded.
 (v) Show that
as whenever is unsigned measurable.
 (vi) Using (iii) and (v), show that (ii) holds for any unsigned measurable (which are not necessarily bounded or of finite measure support).
Next, we apply the integral to absolutely integrable functions. We call a scalar function or absolutely integrable if it is measurable and the unsigned integral is finite. A realvalued absolutely integrable function can be expressed as the difference of two unsigned absolutely integrable functions ; indeed, one can check that the choice and work for this. Conversely, any difference of unsigned absolutely integrable functions is absolutely integrable (this follows from the triangle inequality ). A single absolutely integrable function may be written as a difference of unsigned absolutely integrable functions in more than one way, for instance we might have
for unsigned absolutely integrable functions . But when this happens, we can rearrange to obtain
and thus by linearity of the unsigned integral
By the absolute integrability of , all the integrals are finite, so we may rearrange this identity as
This allows us to define the Lebesgue integral of a realvalued absolutely integrable function to be the expression
for any given decomposition of as the difference of two unsigned absolutely integrable functions. Note that if is both unsigned and absolutely integrable, then the unsigned integral and the Lebesgue integral of agree (as can be seen by using the decomposition ), and so there is no ambiguity in using the same notation to denote both integrals. (By the same token, we may now drop the modifier from the simple integral of a simple unsigned , which we may now also denote by .)
The Lebesgue integral also enjoys good linearity properties:
Exercise 8 Let be realvalued absolutely integrable functions, and let .
 (i) (Linearity) Show that and are also realvalued absolutely integrable functions, with
and
(For the second relation, one may wish to first treat the special cases and .)
 (ii) Show that if and are equal almost everywhere, then
 (iii) Show that , with equality if and only if is zero almost everywhere.
 (iv) (Monotonicity) If almost everywhere, show that .
 (v) (Markov inequality) Show that for any .
Because of part (iii) of the above exercise, we can extend the Lebesgue integral to realvalued absolutely integrable functions that are only defined and realvalued almost everywhere, rather than everywhere. In particular, we can apply the Lebesgue integral to functions that are sometimes infinite, so long as they are only infinite on a set of measure zero, and the function is absolutely integrable everywhere else.
Finally, we extend to complexvalued functions. If is absolutely integrable, observe that the real and imaginary parts are also absolutely integrable (because ). We then define the (complex) Lebesgue integral of in terms of the real Lebesgue integral by the formula
Clearly, if is realvalued and absolutely integrable, then the real Lebesgue integral and the complex Lebesgue integral of coincide, so it does not create ambiguity to use the same symbol for both concepts. It is routine to extend the linearity properties of the real Lebesgue integral to its complex counterpart:
Exercise 9 Let be complexvalued absolutely integrable functions, and let .
 (i) (Linearity) Show that and are also complexvalued absolutely integrable functions, with
and
(For the second relation, one may wish to first treat the special cases and .)
 (ii) Show that if and are equal almost everywhere, then
 (iii) Show that , with equality if and only if is zero almost everywhere.
 (iv) (Markov inequality) Show that for any .
We record a simple, but incredibly fundamental, inequality concerning the Lebesgue integral:
Lemma 10 (Triangle inequality) If is a complexvalued absolutely integrable function, then
Proof: We have
This looks weaker than what we want to prove, but we can “amplify” this inequality to the full strength triangle inequality as follows. Replacing by for any real , we have
Since we can choose the phase to make the expression equal to , the claim follows.
Finally, we observe that the Lebesgue integral extends the Riemann integral, which is particularly useful when it comes to actually computing some of these integrals:
Exercise 11 If is a Riemann integrable function on a compact interval , show that is also absolutely integrable, and that the Lebesgue integral (with Lebesgue measure restricted to ) coincides with the Riemann integral . Similarly if is Riemann integrable on a box .
— 2. Expectation of random variables —
We now translate the above notions of integration on measure spaces to the probabilistic setting.
A random variable taking values in the unsigned extended real line is said to be simple if it takes on at most finitely many values. Equivalently, can be expressed as a finite unsigned linear combination
of indicator random variables, where are unsigned and are events. We then define the simple expectation of to be the quantity
and checks that this definition is independent of the choice of decomposition of into indicator functions. Observe that if we model the random variable using a probability space , then the simple expectation of is precisely the simple integral of the corresponding unsigned simple function .
Next, given an arbitrary unsigned random variable taking values in , one defines its (unsigned) expectation as
where ranges over all simple unsigned random variables such that is surely true. This extends the simple expectation (thus for all simple unsigned ), and in terms of a probability space model , the expectation is precisely the unsigned integral of .
A scalar random variable is said to be absolutely integrable if , thus for instance any bounded random variable is absolutely integrable. If is realvalued and absolutely integrable, we define its expectation by the formula
where is any representation of as the difference of unsigned absolutely integrable random variables ; one can check that this definition does not depend on the choice of representation and is thus welldefined. For complexvalued absolutely integrable , we then define
In all of these cases, the expectation of is equal to the integral of the representation of in any probability space model; in the case that is given by a discrete probability model, one can check that this definition of expectation agrees with the one given in Notes 0. Using the former fact, we can translate the properties of integration already established to the probabilistic setting:
Proposition 12
 (i) (Unsigned linearity) If are unsigned random variables, and is a deterministic unsigned quantity, then and . (Note that these identities hold even when are not absolutely integrable.)
 (ii) (Complex linearity) If are absolutely integrable random variables, and is a deterministic complex quantity, then and are also absolutely integrable, with and .
 (iii) (Compatibility with probability) If is an event, then . In particular, .
 (iv) (Almost sure equivalence) If are unsigned (resp. absolutely integrable) and almost surely, then .
 (v) If is unsigned or absolutely integrable, then , with equality if and only if almost surely.
 (vi) (Monotonicity) If are unsigned or realvalued absolutely integrable, and almost surely, then .
 (vii) (Markov inequality) If is unsigned or absolutely integrable, then for any deterministic .
 (viii) (Triangle inequality) If is absolutely integrable, then .
As before, we can use part (iv) to define expectation of scalar random variables that are only defined and finite almost surely, rather than surely.
Note that we have built the notion of expectation (and of related notions, such as absolute integrability) out of notions that were already probabilistic in nature, in the sense that they were unaffected if one replaced the underlying probabilistic model with an extension. Therefore, the notion of expectation is automatically probabilistic in the same sense. Because of this, we will be easily able to manipulate expectations of random variables without having to explicitly mention an underlying probability space , and so one will now see such spaces fade from view starting from this point in the course.
— 3. Exchanging limits with integrals or expectations —
When performing analysis on measure spaces, it is important to know if one can interchange a limit with an integral:
Similarly, in probability theory, we often wish to interchange a limit with an expectation:
Of course, one needs the integrands or random variables to be either unsigned or absolutely integrable, and the limits to be welldefined to have any hope of doing this. Naively, one could hope that limits and integrals could always be exchanged when the expressions involved are welldefined, but this is unfortunately not the case. In the case of integration on, say, the real line using Lebesgue measure , we already see four key examples:
In all these examples, the limit of the integral exceeds the integral of the limit; by replacing with in the first three examples (which involve absolutely integrable functions) one can also build examples where the limit of the integral is less than the integral of the limit. Most of these examples rely on the infinite measure of the real line and thus do not directly have probabilistic analogues, but the concentrating bump example involves functions that are all supported on the unit interval and thus also poses a problem in the probabilistic setting.
Nevertheless, there are three important cases in which we can relate the limit (or, in the case of Fatou’s lemma, the limit inferior) of the integral to the integral of the limit (or limit inferior). Informally, they are:
These three results then have analogues for convergence of random variables. We will also mention a fourth useful tool in that setting, which allows one to exchange limits and expectations when one controls a higher moment. There are a few more such general results allowing limits to be exchanged with integrals or expectations, but my advice would be to work out such exchanges by hand rather than blindly cite (possibly incorrectly) an additional convergence theorem beyond the four mentioned above, as this is safer and will help strengthen one’s intuition on the situation.
We now state and prove these results more explicitly.
Lemma 13 (Fatou’s lemma) Let be a measure space, and let be a sequence of unsigned measurable functions. Then
An equivalent form of this lemma is that if one has
for some and all sufficiently large , then one has
as well. That is to say, if the original unsigned functions eventually have “mass” less than or equal to , then the limit (inferior) also has “mass” less than or equal to . The limit may have substantially less mass, as the four examples above show, but it can never have more mass (asymptotically) than the functions that comprise the limit. Of course, one can replace limit inferior by limit in the left or right hand side if one knows that the relevant limit actually exists (but one cannot replace limit inferior by limit superior if one does not already have convergence, see Example 15 below). On the other hand, it is essential that the are unsigned for Fatou’s lemma to work, as can be seen by negating one of the first three key examples mentioned above.
Proof: By definition of the unsigned integral, it suffices to show that
whenever is an unsigned simple function with . Multiplying by , it thus suffices to show that
for any and any unsigned as above.
We can write as the sum for some strictly positive and disjoint ; we allow the and the measures to be infinite. On each , we have . Thus, if we define
then the increase to as : . By continuity from below (Exercise 23 of Notes 0), we thus have
as . Since
we conclude upon integration that
and thus on taking limit inferior
But the righthand side is , and the claim follows.
Of course, Fatou’s lemma may be phrased probabilistically:
Lemma 14 (Fatou’s lemma for random variables) Let be a sequence of unsigned random variables. Then
As a corollary, if are unsigned and converge almost surely to a random variable , then
Next, we establish the monotone convergence theorem.
Theorem 16 (Monotone convergence theorem) Let be a measure space, and let be a sequence of unsigned measurable functions which is monotone increasing, thus for all and . Then
Note that the limits exist on both sides because monotone sequences always have limits. Indeed the limit in either side is equal to the supremum. The receding infinity example shows that it is important that the functions here are monotone increasing rather than monotone decreasing. We also observe that it is enough for the to be increasing almost everywhere rather than everywhere, since one can then modify the on a set of measure zero to be increasing everywhere, which does not affect the integrals on either side of this theorem.
Proof: From Fatou’s lemma we already have
On the other hand, from monotonicity we see that
for any natural number , and on taking limits as we obtain the claim.
An important corollary of the monotone convergence theorem is that one can freely interchange infinite sums with integrals for unsigned functions, that is to say
for any unsigned (not necessarily monotone). Indeed, to see this one simply applies the monotone convergence theorem to the partial sums .
We of course can translate this into the probabilistic context:
Theorem 17 (Monotone convergence theorem for random variables) Let be a monotone nondecreasing sequence of unsigned random variables. Then
Similarly, for any unsigned random variables , we have
Again, it is sufficient for the to be nondecreasing almost surely. We note a basic but important corollary of this theorem, namely the (first) BorelCantelli lemma:
Lemma 18 (BorelCantelli lemma) Let be a sequence of events with . Then almost surely, at most finitely many of the events hold; that is to say, one has almost surely.
Proof: From the monotone convergence theorem, we have
By Markov’s inequality, this implies that is almost surely finite, as required.
We will develop a partial converse to this lemma (the “second” BorelCantelli lemma) in a subsequent set of notes. For now, we give a crude converse in which we assume not only that the sum to infinity, but they are in fact uniformly bounded from below:
Exercise 19 Let be a sequence of events with . Show that with positive probability, an infinite number of the hold; that is to say, . (Hint: if for all , establish the lower bound for all . Alternatively, one can apply Fatou’s lemma to the random variables .)
Finally, we give the dominated convergence theorem.
Theorem 20 (Dominated convergence theorem) Let be a measure space, and let be measurable functions which converge pointwise to some limit. Suppose that there is an unsigned absolutely integrable function which dominates the in the sense that for all and all . Then
In particular, the limit on the righthand side exists.
Again, it will suffice for to dominate each almost everywhere rather than everywhere, as one can upgrade this to everywhere domination by modifying each on a set of measure zero. Similarly, pointwise convergence can be replaced with pointwise convergence almost everywhere. The domination of each by a single function implies that the integrals are uniformly bounded in , but this latter condition is not sufficient by itself to guarantee interchangeability of the limit and integral, as can be seen by the first three examples given at the start of this section.
Proof: By splitting into real and imaginary parts, we may assume without loss of generality that the are realvalued. As is absolutely integrable, it is finite almost everywhere; after modification on a set of measure zero we may assume it is finite everywhere. Let denote the pointwise limit of the . From Fatou’s lemma applied to the unsigned functions and , we have
and
Rearranging this (taking crucial advantage of the finite nature of the , and hence and ), we conclude that
and the claim follows.
Remark 21 Amusingly, one can use the dominated convergence theorem to give an (extremely indirect) proof of the divergence of the harmonic series . For, if that series was convergent, then the function would be absolutely integrable, and the spreading bump example described above would contradict the dominated convergence theorem. (Expert challenge: see if you can deconstruct the above argument enough to lower bound the rate of divergence of the harmonic series .)
We again translate the above theorem to the probabilistic context:
Theorem 22 (Dominated convergence theorem for random variables) Let be scalar random variables which converge almost surely to a limit . Suppose there is an unsigned absolutely integrable random variable such that almost surely for each . Then
As a corollary of the dominated convergence theorem for random variables we have the bounded convergence theorem: if are scalar random variables that converge almost surely to a limit , and are almost surely bounded in magnitude by a uniform constant , then we have
(In Durrett, the bounded convergence theorem is proven first, and then used to establish Fatou’s theorem and the dominated and monotone convergence theorems. The order in which one establishes these results – which are all closely related to each other – is largely a matter of personal taste.) A further corollary of the dominated convergence theorem is that one has the identity
whenever are scalar random variables with absolutely integrable (or equivalently, that is finite).
Another useful variant of the dominated convergence theorem is
Theorem 23 (Convergence for random variables with bounded moment) Let be scalar random variables which converge almost surely to a limit . Suppose there is and such that for all . Then
This theorem fails for , as the concentrating bump example shows. The case (that is to say, bounded second moment ) is already quite useful. The intuition here is that concentrating bumps are in some sense the only obstruction to interchanging limits and expectations, and these can be eliminated by hypotheses such as a bounded higher moment hypothesis or a domination hypothesis.
Proof: By taking real and imaginary parts we may assume that the (and hence ) are realvalued. For any natural number , let denote the truncation of to the interval , and similarly define . Then converges pointwise to , and hence by the bounded convergence theorem
On the other hand, we have
(why?) and thus on taking expectations and using the triangle inequality
where we are using the asymptotic notation to denote a quantity bounded in magnitude by for an absolute constant . Also, from Fatou’s lemma we have
so we similarly have
Putting all this together, we see that
Sending , we obtain the claim.
Remark 24 The essential point about the condition was that the function grew faster than linearly as . One could accomplish the same result with any other function with this property, e.g. a hypothesis such as would also suffice. The most natural general condition to impose here is that of uniform integrability, which encompasses the hypotheses already mentioned, but we will not focus on this condition here.
Exercise 25 (Scheffé’s lemma) Let be a sequence of absolutely integrable scalar random variables that converge almost surely to another absolutely integrable scalar random variable . Suppose also that converges to as . Show that converges to zero as . (Hint: there are several ways to prove this result, known as Scheffe’s lemma. One is to split into two components , such that is dominated by but converges almost surely to , and is such that . Then apply the dominated convergence theorem.)
— 4. The distribution of a random variable —
We have seen that the expectation of a random variable is a special case of the more general notion of Lebesgue integration on a measure space. There is however another way to think of expectation as a special case of integration, which is particularly convenient for computing expectations. We first need the following definition.
Definition 26 Let be a random variable taking values in a measurable space . The distribution of (also known as the law of ) is the probability measure on defined by the formula
for all measurable sets ; one easily sees from the Kolmogorov axioms that this is indeed a probability measure.
Example 27 If only takes on at most countably many values (and if every point in is measurable), then the distribution is the discrete measure that assigns each point in the range of a measure of .
Example 28 If is a real random variable with cumulative distribution function , then is the LebesgueStieltjes measure associated to . For instance, if is drawn uniformly at random from , then is Lebesgue measure restricted to . In particular, two scalar variables are equal in distribution if and only if they have the same cumulative distribution function.
Example 29 If and are the results of two separate rolls of a fair die (as in Example 3 of Notes 0), then and are equal in distribution, but are not equal as random variables.
Remark 30 In the converse direction, given a probability measure on a measurable space , one can always build a probability space model and a random variable represented by that model whose distribution is . Indeed, one can perform the “tautological” construction of defining the probability space model to be , and to be the identity function , and then one easily checks that . Compare with Corollaries 26 and 29 of Notes 0. Furthermore, one can view this tautological model as a “base” model for random variables of distribution as follows. Suppose one has a random variable of distribution which is modeled by some other probability space , thus is a measurable function such that
for all . Then one can view the probability space as an extension of the tautological probability space using as the factor map.
We say that two random variables are equal in distribution, and write , if they have the same law: , that is to say for any measurable set in the range. This definition makes sense even when are defined on different sample spaces. Roughly speaking, the distribution captures the “size” and “shape” of the random variable, but not its “location” or how it relates to other random variables.
Theorem 31 (Change of variables formula) Let be a random variable taking values in a measurable space . Let or be a measurable scalar function (giving or the Borel algebra of course) such that either , or that . Then
Thus for instance, if is a real random variable, then
and more generally
for all ; furthermore, if is unsigned or absolutely integrable, one has
The point here is that the integration is not over some unspecified sample space , but over a very explicit domain, namely the reals; we have “changed variables” to integrate over instead over , with the distribution representing the “Jacobian” factor that typically shows up in such change of variables formulae.
Proof: First suppose that is unsigned and only takes on a finite number of values. Then
and hence
as required.
Next, suppose that is unsigned but can take on infinitely many values. We can express as the monotone increasing limit of functions that only take a finite number of values; for instance we can define to be rounded down to the largest multiple of less than both and . By the preceding computation, we have
and on taking limits as using the monotone convergence theorem we obtain the claim in this case.
Now suppose that is realvaluked with . We write where and , then we have and
for . Subtracting these two identities together, we obtain the claim.
Finally, the case of complexvalued with follows from the realvalued case by taking real and imaginary parts.
Example 32 Let be the uniform distribution on , then
for any Riemann integrable ; thus for instance
for any .
Remark 33 An alternate way to prove the change of variables formula is to observe that the formula is obviously true when one uses the tautological model for , and then the claim follows from the modelindependence of expectation and the observation from Remark 30 that any other model for is an extension of the tautological model.
— 5. Some basic inequalities —
We record here for future reference some basic inequalities concerning expectation that we will need in the sequel. We have already seen the triangle inequality
for absolutely integrable , and the Markov inequality
for arbitrary scalar and (note the inequality is trivial if is not absolutely integrable). Applying the Markov inequality to the quantity we obtain the important Chebyshev inequality
for absolutely integrable and , where the Variance of is defined as
Next, we record
Lemma 34 (Jensen’s inequality) If is a convex function, is a real random variable with and both absolutely integrable, then
Proof: Let be a real number. Being convex, the graph of must be supported by some line at , that is to say there exists a slope (depending on ) such that for all . (If is differentiable at , one can take to be the derivative of at , but one always has a supporting line even in the nondifferentiable case.) In particular
Taking expectations and using linearity of expectation, we conclude
and the claim follows from setting .
Exercise 35 (Complex Jensen inequality) Let be a convex function (thus for all complex and all , and let be a complex random variable with and both absolutely integrable. Show that
Note that the triangle inequality is the special case of Jensen’s inequality (or the complex Jensen’s inequality, if is complexvalued) corresponding to the convex function on (or on ). Another useful example is
As a related application of convexity, observe from the convexity of the function that
for any and . This implies in particular Young’s inequality
for any scalar and any exponents with ; note that this inequality is also trivially true if one or both of are infinite. Taking expectations, we conclude that
if are scalar random variabels and are deterministic exponents with . In particular, if are absolutely integrable, then so is , and
We can amplify this inequality as follows. Multiplying by some and dividing by the same , we conclude that
optimising the righthand side in , we obtain (after some algebra, and after disposing of some edge cases when or is almost surely zero) the important Hölder inequality
where we use the notation
for . Using the convention
(thus is the essential supremum of ), we also see from the triangle inequality that the Hölder inequality applies in the boundary case when one of is allowed to be (so that the other is equal to ):
The case is the important CauchySchwarz inequality
valid whenever are squareintegrable in the sense that are finite.
Exercise 36 Show that the expressions are nondecreasing in for . In particular, if is finite for some , then it is automatically finite for all smaller values of .
Exercise 37 For any squareintegrable , show that
Exercise 38 If and are scalar random variables with , use Hölder’s inequality to establish that
and
and then conclude the Minkowski inequality
Show that this inequality is also valid at the endpoint cases and .
Exercise 39 If is nonnegative and squareintegrable, and , establish the PaleyZygmund inequality
(Hint: use the CauchySchwarz inequality to upper bound in terms of and .)
Filed under: 275A  probability theory, math.CA, math.PR Tagged: expectation, integration
Starting this week, I will be teaching an introductory graduate course (Math 275A) on probability theory here at UCLA. While I find myself using probabilistic methods routinely nowadays in my research (for instance, the probabilistic concept of Shannon entropy played a crucial role in my recent paper on the Chowla and Elliott conjectures, and random multiplicative functions similarly played a central role in the paper on the Erdos discrepancy problem), this will actually be the first time I will be teaching a course on probability itself (although I did give a course on random matrix theory some years ago that presumed familiarity with graduatelevel probability theory). As such, I will be relying primarily on an existing textbook, in this case Durrett’s Probability: Theory and Examples. I still need to prepare lecture notes, though, and so I thought I would continue my practice of putting my notes online, although in this particular case they will be less detailed or complete than with other courses, as they will mostly be focusing on those topics that are not already comprehensively covered in the text of Durrett. Below the fold are my first such set of notes, concerning the classical measuretheoretic foundations of probability. (I wrote on these foundations also in this previous blog post, but in that post I already assumed that the reader was familiar with measure theory and basic probability, whereas in this course not every student will have a strong background in these areas.)
Note: as this set of notes is primarily concerned with foundational issues, it will contain a large number of pedantic (and nearly trivial) formalities and philosophical points. We dwell on these technicalities in this set of notes primarily so that they are out of the way in later notes, when we work with the actual mathematics of probability, rather than on the supporting foundations of that mathematics. In particular, the excessively formal and philosophical language in this set of notes will not be replicated in later notes.
— 1. Some philosophical generalities —
By default, mathematical reasoning is understood to take place in a deterministic mathematical universe. In such a universe, any given mathematical statement (that is to say, a sentence with no free variables) is either true or false, with no intermediate truth value available. Similarly, any deterministic variable can take on only one specific value at a time.
However, for a variety of reasons, both within pure mathematics and in the applications of mathematics to other disciplines, it is often desirable to have a rigorous mathematical framework in which one can discuss nondeterministic statements and variables – that is to say, statements which are not always true or always false, but in some intermediate state, or variables that do not take one particular value or another with definite certainty, but are again in some intermediate state. In probability theory, which is by far the most widely adopted mathematical framework to formally capture the concept of nondeterminism, nondeterministic statements are referred to as events, and nondeterministic variables are referred to as random variables. In the standard foundations of probability theory, as laid out by Kolmogorov, we can then model these events and random variables by introducing a sample space (which will be given the structure of a probability space) to capture all the ambient sources of randomness; events are then modeled as measurable subsets of this sample space, and random variables are modeled as measurable functions on this sample space. (We will briefly discuss a more abstract way to set up probability theory, as well as other frameworks to capture nondeterminism than classical probability theory, at the end of this set of notes; however, the rest of the course will be concerned exclusively with classical probability theory using the orthodox Kolmogorov models.)
Note carefully that sample spaces (and their attendant structures) will be used to model probabilistic concepts, rather than to actually be the concepts themselves. This distinction (a mathematical analogue of the mapterritory distinction in philosophy) actually is implicit in much of modern mathematics, when we make a distinction between an abstract version of a mathematical object, and a concrete representation (or model) of that object. For instance:
The distinction between abstract objects and concrete models can be fairly safely discarded if one is only going to use a single model for each abstract object, particularly if that model is “canonical” in some sense. However, one needs to keep the distinction in mind if one plans to switch between different models of a single object (e.g. to perform change of basis in linear algebra, change of coordinates in differential geometry, or base change in algebraic geometry). As it turns out, in probability theory it is often desirable to change the sample space model (for instance, one could extend the sample space by adding in new sources of randomness, or one could couple together two systems of random variables by joining their sample space models together). Because of this, we will take some care in this foundational set of notes to distinguish probabilistic concepts (such as events and random variables) from their sample space models. (But we may be more willing to conflate the two in later notes, once the foundational issues are out of the way.)
From a foundational point of view, it is often logical to begin with some axiomatic description of the abstract version of a mathematical object, and discuss the concrete representations of that object later; for instance, one could start with the axioms of an abstract group, and then later consider concrete representations of such a group by permutations, invertible linear transformations, and so forth. This approach is often employed in the more algebraic areas of mathematics. However, there are at least two other ways to present these concepts which can be preferable from a pedagogical point of view. One way is to start with the concrete representations as motivating examples, and only later give the abstract object that these representations are modeling; this is how linear algebra, for instance, is often taught at the undergraduate level, by starting first with , , and , and only later introducing the abstract vector spaces. Another way is to avoid the abstract objects altogether, and focus exclusively on concrete representations, but taking care to emphasise how these representations transform when one switches from one representation to another. For instance, in general relativity courses in undergraduate physics, it is not uncommon to see tensors presented purely through the concrete representation of coordinates indexed by multiple indices, with the transformation of such tensors under changes of variable carefully described; the abstract constructions of tensors and tensor spaces using operations such as tensor product and duality of vector spaces or vector bundles are often left to an advanced differential geometry class to set up properly.
The foundations of probability theory are usually presented (almost by default) using the last of the above three approaches; namely, one talks almost exclusively about sample space models for probabilistic concepts such as events and random variables, and only occasionally dwells on the need to extend or otherwise modify the sample space when one needs to introduce new sources of randomness (or to forget about some existing sources of randomness). However, much as in differential geometry one tends to work with manifolds without specifying any given atlas of coordinate charts, in probability one usually manipulates events and random variables without explicitly specifying any given sample space. For a student raised exclusively on concrete sample space foundations of probability, this can be a bit confusing, for instance it can give the misconception that any given random variable is somehow associated to its own unique sample space, with different random variables possibly living on different sample spaces, which often leads to nonsense when one then tries to combine those random variables together. Because of such confusions, we will try to take particular care in these notes to separate probabilistic concepts from their sample space models.
— 2. A simple class of models: discrete probability spaces —
The simplest models of probability theory are those generated by discrete probability spaces, which are adequate models for many applications (particularly in combinatorics and other areas of discrete mathematics), and which already capture much of the essence of probability theory while avoiding some of the finer measuretheoretic subtleties. We thus begin by considering discrete sample space models.
Definition 1 (Discrete probability theory) A discrete probability space is an at most countable set (whose elements will be referred to as outcomes), together with a nonnegative real number assigned to each outcome such that ; we refer to as the probability of the outcome . The set itself, without the structure , is often referred to as the sample space, though we will often abuse notation by using the sample space to refer to the entire discrete probability space .
In discrete probability theory, we choose an ambient discrete probability space as the randomness model. We then model an event by subsets of the sample space . The probability of an event is defined to be the quantity
note that this is a real number in the interval . An event is surely true or is the sure event if , and is surely false or is the empty event if .
We model random variables taking values in the range by functions from the sample space to the range . Random variables taking values in will be called real random variables or random real numbers. Similarly for random variables taking values in . We refer to real and complex random variables collectively as scalar random variables.
We consider two events to be equal if they are modeled by the same set: . Similarly, two random variables taking values in a common range are considered to be equal if they are modeled by the same function: . In particular, if the discrete sample space is understood from context, we will usually abuse notation by identifying an event with its model , and similarly identify a random variable with its model .
Remark 2 One can view classical (deterministic) mathematics as the special case of discrete probability theory in which is a singleton set (there is only one outcome ), and the probability assigned to the single outcome in is : . Then there are only two events (the surely true and surely false events), and a random variable in can be identified with a deterministic element of . Thus we can view probability theory as a generalisation of deterministic mathematics.
As discussed in the preceding section, the distinction between a collection of events and random variable and its models becomes important if one ever wishes to modify the sample space, and in particular to extend the sample space to a larger space that can accommodate new sources of randomness (an operation which we will define formally later, but which for now can be thought of as an analogue to change of basis in linear algebra, coordinate change in differential geometry, or base change in algebraic geometry). This is best illustrated with a simple example.
Example 3 (Extending the sample space) Suppose one wishes to model the outcome of rolling a single, unbiased sixsided die using discrete probability theory. One can do this by choosing the discrete proability space to be the sixelement set , with each outcome given an equal probability of of occurring; this outcome may be interpreted as the state in which the die roll ended up being equal to . The outcome of rolling a die may then be identified with the identity function , defined by for . If we let be the event that the outcome of rolling the die is an even number, then with this model we have , and
Now suppose that we wish to roll the die again to obtain a second random variable . The sample space is inadequate for modeling both the original die roll and the second die roll . To accommodate this new source of randomness, we can then move to the larger discrete probability space , where each outcome now having probability ; this outcome can be interpreted as the state in which the die roll ended up being , and the die roll ended up being . The random variable is now modeled by a new function defined by for ; the random variable is similarly modeled by the function defined by for . The event that is even is now modeled by the set
This set is distinct from the previous model of (for instance, has eighteen elements, whereas has just three), but the probability of is unchanged:
One can of course also combine together the random variables in various ways. For instance, the sum of the two die rolls is a random variable taking values in ; it cannot be modeled by the sample space , but in it is modeled by the function
Similarly, the event that the two die rolls are equal cannot be modeled by , but is modeled in by the set
and the probability of this event is
We thus see that extending the probability space has also enlarged the space of events one can consider, as well as the random variables one can define, but that existing events and random variables continue to be interpretable in the extended model, and that probabilistic concepts such as the probability of an event remain unchanged by the extension of the model.
The settheoretic operations on the sample space induce similar boolean operations on events:
Thus, for instance, the conjunction of the event that a die roll is even, and that it is less than , is the event that the die roll is exactly . As before, we will usually be in a situation in which the sample space is clear from context, and in that case one can safely identify events with their models, and view the symbols and as being synonymous with their settheoretic counterparts and (this is for instance what is done in Durrett).
With these operations, the space of all events (known as the event space) thus has the structure of a boolean algebra (defined below in Definition 4). We observe that the probability is finitely additive in the sense that
whenever are disjoint events; by induction this implies that
whenever are pairwise disjoint events. We have and , and more generally
for any event . We also have monotonicity: if , then .
Now we define operations on random variables. Whenever one has a function from one range to another , and a random variable taking values in , one can define a random variable taking values in by composing the relevant models:
thus maps to for any outcome . Given a finite number of random variables taking values in ranges , we can form the joint random variable taking values in the Cartesian product by concatenation of the models, thus
Combining these two operations, given any function of variables in ranges , and random variables taking values in respectively, we can form a random variable taking values in by the formula
Thus for instance we can add, subtract, or multiply two scalar random variables to obtain another scalar random variable.
A deterministic element of a range will (by abuse of notation) be identified with a random variable taking values in , whose model in is constant: for all . Thus for instance is a scalar random variable.
Given a relation on ranges , and random variables , we can define the event by setting
Thus for instance, for two real random variables , the event is modeled as
and the event is modeled as
At this point we encounter a slight notational conflict between the dual role of the equality symbol as a logical symbol and as a binary relation: we are interpreting both as an external equality relation between the two random variables (which is true iff the functions , are identical), and as an internal event (modeled by ). However, it is clear that is true in the external sense if and only if the internal event is surely true. As such, we shall abuse notation and continue to use the equality symbol for both the internal and external concepts of equality (and use the modifier “surely” for emphasis when referring to the external usage).
It is clear that any equational identity concerning functions or operations on deterministic variables implies the same identity (in the external, or surely true, sense) for random variables. For instance, the commutativity of addition for deterministic real numbers immediately implies the commutativity of addition: is surely true for real random variables ; similarly is surely true for all scalar random variables , etc.. We will freely apply the usual laws of algebra for scalar random variables without further comment.
Given an event , we can associate the indicator random variable (also written as in some texts) to be the unique real random variable such that when is true and when is false, thus is equal to when and otherwise. (The indicator random variable is sometimes called the characteristic function in analysis, and sometimes denoted instead of , but we avoid using the term “characteristic function” here, as it will have an unrelated but important meaning in probability theory.) We record the trivial but useful fact that Boolean operations on events correspond to arithmetic manipulations on their indicators. For instance, if are events, we have
and the inclusionexclusion principle
In particular, if the events are disjoint, then
Also note that if and only if the assertion is surely true. We will use these identities and equivalences throughout the course without further comment.
Given a scalar random variable , we can attempt to define the expectation through the model by the formula
If the discrete sample space is finite, then this sum is always welldefined and so every scalar random variable has an expectation. If however the discrete sample space is infinite, the expectation may not be well defined. There are however two key cases in which one has a meaningful expectation. The first is if the random variable is unsigned, that is to say it takes values in the nonnegative reals , or more generally in the extended nonnegative real line . In that case, one can interpret the expectation as an element of . The other case is when the random variable is absolutely integrable, which means that the absolute value (which is an unsigned random variable) has finite expectation: . In that case, the series defining is absolutely convergent to a real or complex number (depending on whether was a real or complex random variable.)
We have the basic link
between probability and expectation, valid for any event . We also have the obvious, but fundamentally important, property of linearity of expectation: we have
and
whenever is a scalar and are scalar random variables, either under the assumption that are all unsigned, or that are absolutely integrable. Thus for instance by applying expectations to (1) we obtain the identity
We close this section by noting that discrete probabilistic models stumble when trying to model continuous random variables, which take on an uncountable number of values. Suppose for instance one wants to model a random real number drawn uniformly at random from the unit interval , which is an uncountable set. One would then expect, for any subinterval of , that will fall into this interval with probability . Setting (or, if one wishes instead, taking a limit such as ), we conclude in particular that for any real number in , that will equal with probability . If one attempted to model this situation by a discrete probability model, we would find that each outcome of the discrete sample space has to occur with probability (since for each , the random variable has only a single value ). But we are also requiring that the sum is equal to , a contradiction. In order to address this defect we must generalise from discrete models to more general probabilistic models, to which we now turn.
— 3. The Kolmogorov foundations of probability theory —
We now present the more general measuretheoretic foundation of Kolmogorov which subsumes the discrete theory, while also allowing one to model continuous random variables. It turns out that in order to perform sums, limits and integrals properly, the finite additivity property of probability needs to be amplified to countable additivity (but, as we shall see, uncountable additivity is too strong of a property to ask for).
We begin with the notion of a measurable space. (See also this previous blog post, which covers similar material from the perspective of a real analysis graduate class rather than a probability class.)
Definition 4 (Measurable space) Let be a set. A Boolean algebra in is a collection of subsets of which
 contains and ;
 is closed under pairwise unions and intersections (thus if , then and also lie in ); and
 is closed under complements (thus if , then also lies in .
(Note that some of these assumptions are redundant and can be dropped, thanks to de Morgan’s laws.) A algebra in (also known as a field) is a Boolean algebra in which is also
 closed under countable unions and countable intersections (thus if , then and ).
Again, thanks to de Morgan’s laws, one only needs to verify closure under just countable union (or just countable intersection) in order to verify that a Boolean algebra is a algebra. A measurable space is a pair , where is a set and is a algebra in . Elements of are referred to as measurable sets in this measurable space.
If are two algebras in , we say that is coarser than (or is finer than ) if , thus every set that is measurable in is also measurable in .
Example 5 (Trivial measurable space) Given any set , the collection is a algebra; in fact it is the coarsest algebra one can place on . We refer to as the trivial measurable space on .
Example 6 (Discrete measurable space) At the other extreme, given any set , the power set is a algebra (and is the finest algebra one can place on ). We refer to as the discrete measurable space on .
Example 7 (Atomic measurable spaces) Suppose we have a partition of a set into disjoint subsets (which we will call atoms), indexed by some label set (which may be finite, countable, or uncountable). Such a partition defines a algebra on , consisting of all sets of the form for subsets of (we allow to be empty); thus a set is measurable here if and only if it can be described as a union of atoms. One can easily verify that this is indeed a algebra. The trivial and discrete measurable spaces in the preceding two examples are special cases of this atomic construction, corresponding to the trivial partition (in which there is just one atom ) and the discrete partition (in which the atoms are individual points in ).
Example 8 Let be an uncountable set, and let be the collection of sets in which are either at most countable, or are cocountable (their complement is at most countable). Show that this is a algebra on which is nonatomic (i.e. it is not of the form of the preceding example).
Example 9 (Generated measurable spaces) It is easy to see that if one has a nonempty family of algebras on a set , then their intersection is also a algebra, even if is uncountably infinite. Becaue of this, whenever one has an arbitrary collection of subsets in , one can define the algebra generated by to be the intersection of all the algebras that contain (note that there is always at least one algebra participating in this intersection, namely the discrete algebra). Equivalently, is the coarsest algebra that views every set in as being measurable. (This is a rather indirect way to describe , as it does not make it easy to figure out exactly what sets lie in . There is a more direct description of this algebra, but it requires the use of the first uncountable ordinal; see Exercise 15 of these notes.) In Durrett, the notation is used in place of .
Example 10 (Borel algebra) Let be a topological space; to avoid pathologies let us assume that is locally compact Hausdorff and compact, though the definition below also can be made for more general spaces. For instance, one could take or for some finite . We define the Borel algebra on to be the algebra generated by the open sets of . (Due to our topological hypotheses on , the Borel algebra is also generated by the compact sets of .) Measurable subsets in the Borel algebra are known as Borel sets. Thus for instance open and closed sets are Borel, and countable unions and countable intersections of Borel sets are Borel. In fact, as a rule of thumb, any subset of or that arises from a “nonpathological” construction (not using the axiom of choice, or from a deliberate attempt to build a nonBorel set) can be expected to be a Borel set. Nevertheless, nonBorel sets exist in abundance if one looks hard enough for them, even without the axiom of choice; see for instance Exercise 16 of this previous blog post.
The following exercise gives a useful tool (somewhat analogous to mathematical induction) to verify properties regarding measurable sets in generated algebras, such as Borel algebras.
Exercise 11 Let be a collection of subsets of a set , and let be a property of subsets of (thus is true or false for each in ). Assume the following axioms:
 is true.
 is true for all .
 If is such that is true, then is also true.
 If are such that is true for all , then is true.
Show that is true for all . (Hint: what can one say about ?)
Thus, for instance, if a property of subsets of is true for all open sets, and is closed under countable unions and complements, then it is automatically true for all Borel sets.
Example 12 (Pullback) Let be a measurable space, and let be any function from another set to . Then we can define the pullback of the algebra to be the collection of all subsets in that are of the form for some . This is easily verified to be a algebra. We refer to the measurable space as the pullback of the measurable space by . Thus for instance an atomic measurable space on generated by a partition is the pullback of (viewed as a discrete measurable space) by the “colouring” map from to that sends each element of to for all .
Remark 13 In probabilistic terms, one can interpret the space in the above construction as a sample space, and the function as some collection of “random variables” or “measurements” on that space, with being all the possible outcomes of these measurements. The pullback then represents all the “information” one can extract from that given set of measurements.
Example 14 (Product space) Let be a family of measurable spaces indexed by a (possibly infinite or uncountable) set . We define the product on the Cartesian product space by defining to be the algebra generated by the basic cylinder sets of the form
for and . For instance, given two measurable spaces and , the product algebra is generated by the sets and for . (One can also show that is the algebra generated by the products for , but this observation does not extend to uncountable products of measurable spaces.)
Exercise 15 Show that with the Borel algebra is the product of copies of with the Borel algebra.
As with almost any other notion of space in mathematics, there is a natural notion of a map (or morphism) between measurable spaces.
Definition 16 A function between two measurable spaces , is said to be measurable if one has for all .
Thus for instance the pullback of a measurable space by a map could alternatively be defined as the coarsest measurable space structure on for which is still measurable. It is clear that the composition of measurable functions is also measurable.
Exercise 17 Show that any continuous map from topological spaces is measurable (when one gives and the Borel algebras).
Exercise 18 If are measurable functions into measurable spaces , show that the joint function into the product space defined by is also measurable.
As a corollary of the above exercise, we see that if are measurable, and is measurable, then is also measurable. In particular, if or are scalar measurable functions, then so are , , , etc..
Next, we turn measurable spaces into measure spaces by adding a measure.
Definition 19 (Measure spaces) Let be a measurable space. A finitely additive measure on this space is a map obeying the following axioms:
 (Empty set) .
 (Finite additivity) If are disjoint, then .
A countably additive measure is a finitely additive measure obeying the following additional axiom:
 (Countable additivity) If are disjoint, then .
A probability measure on is a countably additive measure obeying the following additional axiom:
 (Unit total probability) .
A measure space is a triplet where is a measurable space and is a measure on that space. If is furthermore a probability measure, we call a probability space.
Example 20 (Discrete probability measures) Let be a discrete measurable space, and for each , let be a nonnegative real number such that . (Note that this implies that there are at most countably many for which – why?.) Then one can form a probability measure on by defining
for all .
Example 21 (Lebesgue measure) Let be given the Borel algebra. Then it turns out there is a unique measure on , known as Lebesgue measure (or more precisely, the restriction of Lebesgue measure to the Borel algebra) such that for every closed interval with (this is also true if one uses open intervals or halfopen intervals in place of closed intervals). More generally, there is a unique measure on for any natural number , also known as Lebesgue measure, such that
for all closed boxes , that is to say products of closed intervals. The construction of Lebesgue measure is a little tricky; see this previous blog post for details.
We can then set up general probability theory similarly to how we set up discrete probability theory:
Definition 22 (Probability theory) In probability theory, we choose an ambient probability space as the randomness model, and refer to the set (without the additional structures , ) as the sample space for that model. We then model an event by elements of algebra . The probability of an event is defined to be the quantity
An event is surely true or is the sure event if , and is surely false or is the empty event if . It is almost surely true or an almost sure event if , and almost surely false or a null event if .
We model random variables taking values in the range by measurable functions from the sample space to the range . We define real, complex, and scalar random variables as in the discrete case.
As in the discrete case, we consider two events to be equal if they are modeled by the same set: . Similarly, two random variables taking values in a common range are considered to be equal if they are modeled by the same function: . Again, if the sample space is understood from context, we will usually abuse notation by identifying an event with its model , and similarly identify a random variable with its model .
As in the discrete case, settheoretic operations on the sample space induce similar boolean operations on events. Furthermore, since the algebra is closed under countable unions and countable intersections, we may similarly define the countable conjunction or countable disjunction of a sequence of events; however, we do not define uncountable conjunctions or disjunctions as these may not be welldefined as events.
The axioms of a probability space then yield the Kolmogorov axioms for probability:
We can manipulate random variables just as in the discrete case, with the only caveat being that we have to restrict attention to measurable operations. For instance, if is a random variable taking values in a measurable space , and is a measurable map, then is well defined as a random variable taking values in . Similarly, if is a measurable map and are random variables taking values in respectively, then is a random variable taking values in . Similarly we can create events out of measurable relations (giving the boolean range the discrete algebra, of course). Finally, we continue to view deterministic elements of a space as a special case of a random element of , and associate the indicator random variable to any event as before.
We say that two random variables agree almost surely if the event is almost surely true; this is an equivalence relation. In many cases we are willing to consider random variables up to almost sure equivalence. In particular, we can generalise the notion of a random variable slightly by considering random variables whose models are only defined almost surely, i.e. their domain is not all of , but instead with a set of measure zero removed. This is, technically, not a random variable as we have defined it, but it can be associated canonically with an equivalence class of random variables up to almost sure equivalence, and so we view such objects as random variables “up to almost sure equivalence”. Similarly, we declare two events and almost surely equivalent if their symmetric difference is a null event, and will often consider events up to almost sure equivalence only.
We record some simple consequences of the measuretheoretic axioms:
Exercise 23 Let be a measure space.
 (Monotonicity) If are measurable, then .
 (Subadditivity) If are measurable (not necessarily disjoint), then .
 (Continuity from below) If are measurable, then .
 (Continuity from above) If are measurable and is finite, then . Give a counterexample to show that the claim can fail when is infinite.
Of course, these measuretheoretic facts immediately imply their probabilistic counterparts (and the pesky hypothesis that is finite is automatic and can thus be dropped):
Note that if a countable sequence of events each hold almost surely, then their conjunction does as well (by applying subadditivity to the complementary events . As a general rule of thumb, the notion of “almost surely” behaves like “surely” as long as one only performs an at most countable number of operations (which already suffices for a large portion of analysis, such as taking limits or performing infinite sums).
Exercise 24 Let be a measurable space.
 If is a function taking values in the extended reals , show that is measurable (giving the Borel algebra) if and only if the sets are measurable for all real .
 If are functions, show that if and only if for all reals .
 If are measurable, show that , , , and are all measurable.
Remark 25 Occasionally, there is need to consider uncountable suprema or infima, e.g. . It is then no longer automatically the case that such an uncountable supremum or infimum of measurable functions is again measurable. However, in practice one can avoid this issue by carefully rewriting such uncountable suprema or infima in terms of countable ones. For instance, if it is known that depends continuously on for each , then , and so measurability is not an issue.
Using the above exercise, if one is given a sequence of random variables taking values in the extended real line , we can define the random variables , , , which also take values in the extended real line, and which obey relations such as
for any real number .
We now say that a sequence of random variables in the extended real line converges almost surely if one has
almost surely, in which case we can define the limit (up to almost sure equivalence) as
This corresponds closely to the concept of almost everywhere convergence in measure theory, which is a slightly weaker notion than pointwise convergence which allows for bad behaviour on a set of measure zero. (See this previous blog post for more discussion on different notions of convergence of measurable functions.)
We will defer the general construction of expectation of a random variable to the next set of notes, where we review the notion of integration on a measure space. For now, we quickly review the basic construction of continuous scalar random variables.
Exercise 26 Let be a probability measure on the real line (with the Borel algebra). Define the Stieltjes measure function associated to by the formula
Establish the following properties of :
 (i) is nondecreasing.
 (ii) and .
 (iii) is rightcontinuous, thus for all .
There is a somewhat difficult converse to this exercise: if is a function obeying the above three properties, then there is a unique probability measure on (the LebesgueStieltjes measure associated to ) for which is the Stieltjes measure function. See Section 3 of this previous post for details. As a consequence of this, we have
Corollary 27 (Construction of a single continuous random variable) Let be a function obeying the properties (i)(iii) of the above exercise. Then, by using a suitable probability space model, we can construct a real random variable with the property that
for all .
Indeed, we can take the probability space to be with the Borel algebra and the LebesgueStieltjes measure associated to . This corollary is not fully satisfactory, because often we may already have chosen a probability space to model some other random variables, and the probability space provided by this corollary may be completely unrelated to the one used. We can resolve these issues with product measures and other joinings, but this will be deferred to a later set of notes.
Define the cumulative distribution function of a real random variable to be the function
Thus we see that cumulative distribution functions obey the properties (i)(iii) above, and conversely any function with those properties is the cumulative distribution function of some real random variable. We say that two real random variables (possibly on different sample spaces) agree in distribution if they have the same cumulative distribution function. One can therefore define a real random variable, up to agreement in distribution, by specifying the cumulative distribution function. See Durrett for some standard real distributions (uniform, normal, geometric, etc.) that one can define in this fashion.
Exercise 28 Let be a real random variable with cumulative distribution function . For any real number , show that
and
In particular, one has for all if and only if is continuous.
Note in particular that this illustrates the distinction between almost sure and sure events: if has a continuous cumulative distribution function, and is a real number, then is almost surely false, but it does not have to be surely false. (Indeed, if one takes the sample space to be and to be the identity function, then will not be surely false.) On the other hand, the fact that is equal to some real number is of course surely true. The reason these statements are consistent with each other is that there are uncountably many real numbers . (Countable additivity tells us that a countable disjunction of null events is still null, but says nothing about uncountable disjunctions.)
Exercise 29 (Skorokhod representation of scalar variables) Let be a uniform random variable taking values in (thus has cumulative distribution function ), and let be another cumulative distribution function. Show that the random variables
and
are indeed random variables (that is to say, they are measurable in any given model ), and have cumulative distribution function . (This construction is attributed to Skorokhod, but it should not be confused with the Skorokhod representation theorem. It provides a quick way to generate a single scalar variable, but unfortunately it is difficult to modify this construction to generate multiple scalar variables, especially if they are somehow coupled to each other.)
There is a multidimensional analogue of the above theory, which is almost identical, except that the monotonicity property has to be strengthened:
Exercise 30 Let be a probability measure on (with the Borel algebra). Define the Stieltjes measure function associated to by the formula
Establish the following properties of :
 (i) is nondecreasing: whenever for all .
 (ii) and .
 (iii) is rightcontinuous, thus for all , where the superscript denotes that we restrict each to be greater than or equal to .
 (iv) One has
whenever are real numbers for . (Hint: try to express the measure of a box with respect to in terms of the Stieltjes measure function .)
Again, there is a difficult converse to this exercise: if is a function obeying the above four properties, then there is a unique probability measure on for which is the Stieltjes measure function. See Durrett for details; one can also modify the arguments in this previous post. In particular, we have
Corollary 31 (Construction of several continuous random variables) Let be a function obeying the properties (i)(iv) of the above exercise. Then, by using a suitable probability space model, we can construct real random variables with the property that
for all .
Again, this corollary is not completely satisfactory because the probability space produced by it (which one can take to be with the Borel algebra and the LebesgueStieltjes measure on ) may not be the probability space one wants to use; we will return to this point later.
— 4. Variants of the standard foundations (optional) —
We have focused on the orthodox foundations of probability theory in which we model events and random variables through probability spaces. In this section, we briefly discuss some alternate ways to set up the foundations, as well as alternatives to probability theory itself. (Actually, many of the basic objects and concepts in mathematics have multiple such foundations; see for instance this blog post exploring the many different ways to define the notion of a group.) We mention them here in order exclude them from discussion in subsequent notes, which will be focused almost exclusively on orthodox probability.
One approach to the foundations of probability is to view the event space as an abstract algebra – a collection of abstract objects with operations such as and (and and ) that obey a number of axioms; see this previous post for a formal definition. The probability map can then be viewed as an abstract probability measure on , that is to say a map from to that obeys the Kolmogorov axioms. Random variables taking values in a measurable space can be identified with their pullback map , which is the morphism of (abstract) algebras that sends a measurable set to the event in ; with some care one can then redefine all the operations in previous sections (e.g. applying a measurable map to a random variable taking values in to obtain a random variable taking values in ) in terms of this pullback map, allowing one to define random variables satisfactorily in this abstract setting. The probability space models discussed above can then be viewed as representations of abstract probability spaces by concrete ones. It turns out that (up to null events) any abstract probability space can be represented by a concrete one, a result known as the LoomisSikorski theorem; see this previous post for details.
Another, related, approach is to start not with the event space, but with the space of scalar random variables, and more specifically with the space of almost surely bounded scalar random variables (thus, there is a deterministic scalar such that almost surely). It turns out that this space has the structure of a commutative tracial (abstract) von Neumann algebra. Conversely, starting from a commutative tracial von Neumann algebra one can form an abstract probability space (using the idempotent elements of the algebra as the events), and thus represent this algebra (up to null events) by a concrete probability space. This particular choice of probabilistic foundations is particularly convenient when one wishes to generalise classical probability to noncommutative probability, as this is simply a matter of dropping the axiom that the von Neumann algebra is commutative. This leads in particular to the subjects of quantum probability and free probability, which are generalisations of classical probability that are beyond the scope of this course (but see this blog post for an introduction to the latter, and this previous post for an abstract algebraic description of a probability space).
It is also possible to model continuous probability via a nonstandard version of discrete probability (or even finite probability), which removes some of the technicalities of measure theory at the cost of replacing them with the formalism of nonstandard analysis instead. This approach was pioneered by Ed Nelson, but will not be discussed further here. (See also these previous posts on the Loeb measure construction, which is a closely related way to combine the power of measure theory with the conveniences of nonstandard analysis.)
One can generalise the traditional, countably additive, form of probability by replacing countable additivity with finite additivity, but then one loses much of the ability to take limits or infinite sums, which reduces the amount of analysis one can perform in this setting. Still, finite additivity is good enough for many applications, particularly in discrete mathematics. An even broader generalisation is that of qualitative probability, in which events that are neither almost surely true or almost surely false are not assigned any specific numerical probability between or , but are simply assigned a symbol such as to indicate their indeterminate status; see this previous blog post for this generalisation, which can for instance be used to view the concept of a “generic point” in algebraic geometry or metric space topology in probabilistic terms.
There have been multiple attempts to move more radically beyond the paradigm of probability theory and its relatives as discussed above, in order to more accurately capture mathematically the concept of nondeterminism. One family of approaches is based on replacing deterministic logic by some sort of probabilistic logic; another is based on allowing several parameters in one’s model to be unknown (as opposed to being probabilistic random variables), leading to the area of uncertainty quantification. These topics are well beyond the scope of this course.
Filed under: 275A  probability theory, math.CA, math.PR Tagged: foundations
It was just announced that this year's Nobel Prize in physics goes to Takaaki Kajita from the SuperKamiokande Collaboration and Arthur B. McDonald from the Sudbury Neutrino Observatory (SNO) Collaboration “for the discovery of neutrino oscillations, which shows that neutrinos have mass.” On this occasion, I am reposting a brief summary of the evidence for neutrino masses that I wrote in 2007.
To get deeper into this paper:
we should think about the 24cell, the $\mathrm{D}_4$ Dynkin diagram, and the Lie algebra $\mathfrak{so}(8)$ that it describes. After all, its this stuff that underlies the octonions, which in turn underlies the exceptional Lie algebras, which are the main subject of Manivel’s paper.
Remember that the root lattice of $\mathfrak{so}(2n)$ is called the $\mathrm{D}_n$ lattice: it consists of vectors in $\mathbb{R}^n$ with integer entries that sum to an even number.
I’m interested in $\mathfrak{so}(8)$ so I’m interested in the $\mathrm{D}_4$ root lattice: 4tuples of integers that sum to an even number! If you like the Hurwitz integral quaternions these are essentially the same thing multiplied by a factor of 2.
The shortest nonzero vectors in the $\mathrm{D}_4$ lattice are the roots. There are 16 like this:
$(\pm 1, \pm 1, \pm 1, \pm 1)$
and 8 like this:
$(\pm 2, 0, 0 , 0) \; and \; permutations$
The roots form the vertices of a regular polytope in 4 dimensions. It’s called the 24cell because it not only has 24 vertices, it also has 24 octahedral faces: it’s selfdual.
One thing we’re seeing here is that we can take the vertices of a 4dimensional cube, namely $(\pm 1, \pm 1, \pm 1, \pm 1)$:
and the vertices of a 4dimensional orthoplex, namely $(\pm 2, 0, 0 , 0)$ and permutations:
and together they form the vertices of a 24cell:
(In case you forgot, an orthoplex or crosspolytope is the $n$dimensional analogue of an octahedron, a regular polytope with $2n$ vertices.)
However, we can go further! If you take every other vertex of an $n$dimensional cube — that is, start with one vertex and then all its second nearest neighbors and their second nearest neighbors and so on — you get the vertices of something called the $n$dimensional demicube. In 3 dimensions, a demicube is a regular tetrahedron, so you can fit two tetrahedra in a cube like this:
But in 4 dimensions, a demicube is an orthoplex! This is easy to see: if we take these points
$(\pm 1, \pm 1, \pm 1, \pm 1)$
and keep only those with an even number of minus signs, we get
$\pm (1,1,1,1)$
$\pm (1,1,1,1)$
$\pm (1,1,1,1)$
$\pm (1,1,1,1)$
which are the 8 vertices of an orthoplex.
So, we can take the vertices of a 24cell and partition them into three 8element sets, each being the vertices of an orthoplex!
And this has a nice representationtheoretic significance:
the vertices of the first orthoplex correspond to the weights of the 8dimensional vector representation of $\mathfrak{so}(8)$;
the vertices of the second correspond to the weights of the 8dimensional lefthanded spinor representation of $\mathfrak{so}(8)$;
the vertices of the third correspond to the weights of the 8dimensional righthanded spinor representation of $\mathfrak{so}(8).$
Here I’m being a bit sneaky, identifying the weight lattice of $\mathrm{D}_4$ with the root lattice. In fact it’s twice as dense, but it ‘looks the same’: it’s the same up to a rescaling and rotation. The weight lattice contains a 24cell whose vertices are the weights of the vector, left and righthanded spinor representations. But the root lattice, which is what I really had been talking about, contains a larger 24cell whose vertices are the roots — that is, the nonzero weights of the adjoint representation of $\mathfrak{so}(8)$ on itself.
Now, you may wonder which is the ‘first’ orthoplex, which is the ‘second’ one and which is the ‘third’, but it doesn’t matter much since there’s a symmetry between them! This is triality. The group $\mathrm{S}_3$ acts on $\mathfrak{so}(8)$ as outer automorphisms, and thus on the weight lattice and root lattice. So, it acts as symmetries of the 24cell — and it acts by permuting the 3 orthoplexes!
But here’s the cool thing that Manivel focuses on. Suppose we take the 24cell whose vertices are the roots of $\mathrm{D}_4$. Any orthoplex in here consists of 4 pairs of opposite roots along 4 orthogonal axies. So, it gives a root system of type $\mathrm{A}_1 \times \mathrm{A}_1 \times \mathrm{A}_1 \times \mathrm{A}_1$. In other words, it picks out a copy of $\mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})$ in $\mathfrak{so}(8,\mathbb{C})$.
This Lie subalgebra
$\mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \; \subset \; \mathfrak{so}(8,\mathbb{C})$
acts on $\mathfrak{so}(8,\mathbb{C})$ by the adjoint action. It acts on itself, and it acts on the rest of $\mathfrak{so}(8,\mathbb{C})$ via the representation
$\mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2$
where each copy of $\mathfrak{sl}(2,\mathbb{C})$ acts on just one factor of $\mathbb{C}^2$ (in the obvious way). The weights of this ‘rest of’ $\mathfrak{so}(8,\mathbb{C})$ form the vertices of a 4dimensional cube.
So, we’re getting a nice vector space isomorphism
$\mathfrak{so}(8,\mathbb{C}) \;\; \cong \;\; \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})\; \; \oplus \;\; \mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2$
which is based on how the vertices of a 24cell can be partitioned into the vertices of an orthoplex and a cube. But in fact we’re getting 3 such isomorphisms, related by triality!
If we call the 4 copies of $\mathbb{C}^2$ here $V_1, V_2, V_3, V_4$, we can write
$\mathfrak{so}(8,\mathbb{C}) \;\; \cong \;\; \bigoplus_{i = 1}^4 \mathfrak{sl}(V_i) \; \oplus \; \bigotimes_{i = 1}^4 V_i$
Also, we can write the three 8dimensional irreducible representations of $\mathfrak{so}(8,\mathbb{C})$ as
$(V_1 \otimes V_2) \oplus (V_3 \otimes V_4)$
$(V_1 \otimes V_3) \oplus (V_2 \otimes V_4)$
$(V_1 \otimes V_4) \oplus (V_2 \otimes V_3)$
Note that they come from the three ways of partitioning a 4element set into two 3element sets.
Manivel calls this description of $\mathfrak{so}(8,\mathbb{C})$ the fourality description, but I prefer to speak of tetrality. How is tetrality related to triality? The Dynkin diagram of $\mathrm{D}_4$ has $S_3$ symmetry, which yields triality:
But the corresponding extended or affine Dynkin diagram $\widetilde{\mathrm{D}}_4$ has $S_4$ symmetry, which yields tetrality:
One reason the affine Dynkin diagram is important is that maximalrank Lie subalgebras of a Lie algebra with Dynkin diagram $X$ are obtained by deleting a dot from the corresponding affine Dynkin diagram $\widetilde{X}$. If we delete the middle dot of $\widetilde{\mathrm{D}}_4$ we get a disconnected diagram with 4 separate dots, which is the diagram for $\mathrm{A}_1 \times \mathrm{A}_1 \times \mathrm{A}_1 \times \mathrm{A}_1$. This is why $\mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})$ shows up as a maximal rank Lie subalgebra of $\mathfrak{so}(8,\mathbb{C})$.
The relation between tetrality and triality, I believe, is that there’s a homomorphism $S_4 \to S_3$ coming from the three ways of partitioning a 4element set into two 3element sets.
This is a good place to stop, though we’re really just getting started.
"Metals are shiny." That's one of my favourite punchlines to end a class on electromagnetism with, and that's what I did today. I just love bringing up a bit of everyday physics as a striking consequence of two hours worth of development on the board, and this is a good one for that. I hope the class enjoyed it as much as I did! (Basically, as you can't see in the snapshot of my notes in the photo, those expressions are results of a computation of the [...] Click to continue reading this post
The post Metals are Shiny appeared first on Asymptotia.
The other day the ThomasFermi model (and its enhancements by Dirac and others) wandered across my desk (and one of my virtual blackboards as you can see in the picture) for a while. Putting aside why it showed up (perhaps I will say later on, but I cannot now), it was fun to delve for a while into some of these early attempts in quantum mechanics to try to understand approximation methods for treating fairly complicated quantum systems (like atoms of various sizes). The basic model showed up in 1927, just a year after Schrodinger's [...] Click to continue reading this post
The post Thomas and Fermi appeared first on Asymptotia.
Another fall day, another holiday closing at the JCC. I was home with The Pip for most of the day, which was the usual mix of fun, exhausting, and puzzling. For example, while I offered several times to go out to a playground before lunch, he refused. But then insisted that we walk to the store to buy… something. I got this picture with my phone:
Because it amused me to see a bike rack with just a little red tricycle in it.
We did go to a couple of playgrounds later, and I shot some video that I’ll use for physicsy stuff at some point. But this is as good a photooftheday as anything else I got.
Just one more Jewish holiday to get through, tomorrow, then it’s smooth sailing for a few weeks at least…
We’ve just put up an ad for a new 2 year postdoctoral position at the ANU, to work with myself and Tony Licata. We’re looking for someone who’s interested in operator algebras, quantum topology, and/or representation theory, to collaborate with us on Australian Research Council funded projects.
The ad hasn’t yet been crossposted to MathJobs, but hopefully it will eventually appear there! In any case, applications need to be made through the ANU website. You need to submit a CV, 3 references, and a document addressing the selection criteria. Let me know if you have any questions about the application process, the job, or Canberra!
I’m really enjoying this article, so I’d like to talk about it here at the nCategory Café:
It’s a bit intense, so it may take a series of posts, but let me just get started…
I started reading this paper because I wanted to finally understand the famous “27 lines on a cubic surface” and how they’re related to the smallest nontrivial representations of $\mathrm{E}_6$, which are 27dimensional. Actually $\mathrm{E}_6$ has two nontrivial representations of this dimension, which are not isomorphic: one is the exceptional Jordan algebra, and one is its dual! The exceptional Jordan algebra consists of $3 \times 3$ selfadjoint octonionic matrices, so it has dimension
$8+8+8+3 = 27$
The determinant of such a matrix turns out to be welldefined despite the noncommutativity and nonassociativity of the octonions. $\mathrm{E}_6$ is the group of linear transformations of the exceptional Jordan algebra that preserves the determinant.
As you might expect, all this stuff going on in dimension 27 is just the tip of an iceberg—and Manivel explores quite a large chunk of that iceberg. But today let me just touch on the tip.
For starters, the Cayley–Salmon theorem says that every smooth cubic surface in $\mathbb{C}\mathrm{P}^3$ has exactly 27 lines on it.
(This is apparently one of those cases where mathematicians shared credit with the meal that inspired their work, like the Fermi–Pasta–Ulam problem.)
I can’t visualize those 27 lines in general. But Clebsch gave an example of a smooth real cubic surface where all the lines actually lie in $\mathbb{R}\mathrm{P}^3$, so you can see them. This is called the Clebsch surface, or Klein’s icosahedral cubic surface, because Klein also worked on it. It looks like this:
and the 27 lines look like this:
Please click on the pictures to see who created them! Here’s a model of it — one of those nice old plaster models you see in old universities:
This model is in Göttingen, photographed by Oliver Zauzig.
I would enjoy diving down the rabbit hole here and learning everything about this particular cubic surface, but I’ll resist for now! I’ll just say a few things:
First, the Clebsch surface can be described very nicely as a surface in $\mathbb{R}\mathrm{P}^4$ using the homogeneous equations
$x_0+x_1+x_2+x_3+x_4 = 0$ $x_0^3+x_1^3+x_2^3+x_3^3+x_4^3 = 0$
but then you can eliminate one variable and think of it as a surface in $\mathbb{R}\mathrm{P}^3$ given by the equation
$x_1^3+x_2^3+x_3^3+x_4^3 = (x_1+x_2+x_3+x_4)^3$
Second, the lines are actually defined over the golden field $\mathbb{Q}[\sqrt{5}]$. This may have something to do with why it’s called ‘Klein’s icosahedral cubic surface’ — I’ll avoid looking into that right now, but it may eventually be important, because there are nice relations between some exceptional Lie algebras and the golden field.
Third, you can see some points where three lines intersect: these are called Eckardt points and there are 10 of them.
Anyway, you may be wondering why there are 27 lines on a smooth cubic surface. The best argument I’ve seen so far, in terms of maximum friendliness, minimum jargon, and maximum total insight conveyed, is here:
I can’t say I fully understand it, since it’s fairly involved, but I still recommend it to anyone who knows a reasonable amount of algebraic geometry.
Anyway, I don’t think one needs to fully understand this to start wondering what $\mathrm{E}_6$ has to do with it. Here’s some of what Manivel has to say:
The configuration of the 27 lines on a smooth cubic surface in $\mathbb{C}\mathrm{P}^3$ has been thoroughly investigated by the classical algebraic geometers. It has been known for a long time that the automorphism group of this configuration can be identified with the Weyl group of the root system of type $\mathrm{E}_6$, of order 51,840. Moreover, the minimal representation $J$ of the simply connected complex Lie group of type $\mathrm{E}_6$ has dimension 27.
Here the letter $J$ means ‘exceptional Jordan algebra’.
This is a minuscule representation, meaning that the weight spaces are lines and that the Weyl group $W(\mathrm{E}_6)$ acts transitively on the weights. In fact one can recover the lines configuration of the cubic surface by defining two weights to be incident if they are not orthogonal with respect to the unique (up to scale) invariant scalar product. Conversely, one can recover the action of the Lie group $\mathrm{E}_6$ on $J$ from the line configuration.
I hope to come back to this and keep digging deeper. Right now I don’t even understand how the 27 weight spaces in $J$, which are 1dimensional subspaces in a 27dimensional space, are connected to 27 lines in some surface. But there are some other things in Manivel’s paper that I understand and like a lot.
String theorists and researchers working on loop quantum gravity (LQG) like to each point out how their own attempt to quantize gravity is better than the others’. In the end though, they’re both trying to achieve the same thing – consistently combining quantum field theory with gravity – and it is hard to pin down just exactly what makes strings and loops incompatible. Other than egos that is.
The obvious difference used to be that LQG works only in 4 dimensions, whereas string theory works only in 10 dimensions, and LQG doesn’t allow for supersymmetry, which is a consequence of quantizing strings. However, several years ago the LQG framework has been extended to higher dimensions, and they can now also include supergravity, so that objection is gone.So, once again, the Nobel week is upon us. And one of the topics of conversations for the “water cooler chat” in physics departments around the world is speculations on who (besides the infamous Hungarian “physicist” — sorry for the insider joke, I can elaborate on that if asked) would get the Nobel Prize in physics this year. What is your prediction?
With invention of various metrics for “measuring scientific performance” one can make educated guesses — and even put predictions on the industrial footage — see Thomson Reuters predictions based on a number of citations (they did get the EnglertHiggs prize right, but are almost always off). Or even try your luck with online betting (sorry, no link here — I don’t encourage this). So there is a variety of ways to make you interested.
My predictions for 2015: Vera Rubin for Dark Matter or Deborah Jin for fermionic condensates. But you must remember that my record is no better than that of Thomson Reuters.
Those of you who are interested in college math instruction may be interested in a nolongersonew blog “Michigan Math In Action”, which a number of our faculty started last year. (I was involved in the sense of telling people “blogs are fun!”, but haven’t written anything for them yet.) It mostly features thoughtful pieces on teaching calculus and similar courses.
Recently, Gavin Larose put up a lengthy footnoted post on the effort that goes into running our “Gateway testing” center, and the benefits we get from it. This is a room designed for proctoring computerized tests of basic skills, and we use it for things like routine differentiation or putting matrices into reduced row echelon form, which we want every student to know but which are a waste of class time. Check it out!
I first learned about exact squares from a blog post written by Mike Shulman on the $n$Category Café.
Today I want to describe a family of exact squares, which are also homotopy exact, that I had not encountered previously. These make a brief appearance in a new preprint, A necessary and sufficient condition for induced model structures, by Kathryn Hess, Magdalena Kedziorek, Brooke Shipley, and myself.
Proposition. If $R$ is any (generalized) Reedy category, with $R^+ \subset R$ the direct subcategory of degreeincreasing morphisms and $R^ \subset R$ the inverse subcategory of degreedecreasing morphisms, then the pullback square: $\array{ iso(R) & \to & R^ \\ \downarrow & \swArrow id & \downarrow \\ R^+ & \to & R}$ is (homotopy) exact.
In summary, a Reedy category $(R,R^+,R^)$ gives rise to a canonical exact square, which I’ll call the Reedy exact square.
Let’s recall the definition. Consider a square of functors inhabited by a natural transformation $\array{A & \overset{f}{\to} & B\\ ^u\downarrow & \swArrow\alpha & \downarrow^v\\ C& \underset{g}{\to} & D}$ For any category $M$, precomposition defines a square $\array{M^A & \overset{f^\ast}{\leftarrow} & M^B\\ ^{u^\ast}\uparrow & \swArrow \alpha^\ast & \uparrow^{v^\ast}\\ M^C& \underset{g^\ast}{\leftarrow} & M^D}$ Supposing there exist left Kan extensions $u_! \dashv u^\ast$ and $v_! \dashv v^\ast$ and right Kan extensions $f^\ast \dashv f_\ast$ and $g^\ast \dashv g_\ast$, the mates of $\alpha^*$ define canonical BeckChevalley transformations: $u_! f^\ast \Rightarrow g^\ast v_!\quad and \quad v^\ast g_\ast \Rightarrow f_\ast u^\ast.$ Note if either of the BeckChevalley transformations is an isomorphism, the other one is too by the (contravariant) correspondence between natural transformations between a pair of left adjoints and natural transformations between the corresponding right adjoints.
Definition. $\array{A & \overset{f}{\to} & B\\ ^u\downarrow & \swArrow\alpha & \downarrow^v\\ C& \underset{g}{\to} & D}$ is an exact square if, for any $M$ admitting pointwise Kan extensions, the BeckChevalley transformations are isomorphisms.
Comma squares provide key examples, in which case the BeckChevalley isomorphisms recover the limit and colimit formulas for pointwise Kan extensions.
The notion of homotopy exact square is obtained by replacing $M$ by some sort of homotopical category, the adjoints by derived functors, and “isomorphism” by “equivalence.”
In the preprint we give a direct proof that these Reedy squares are exact by computing the Kan extensions, but exactness follows more immediately from the following characterization theorem, stated using comma categories. The natural transformation $\alpha \colon v f \Rightarrow g u$ induces a functor $B \downarrow f \times_A u \downarrow C \to v \downarrow g$ over $C \times B$ defined on objects by sending a pair $b \to f(a), u(a) \to c$ to the composite morphism $v(b) \to v f(a) \to g u(a) \to g(c)$. Fixing a pair of objects $b$ in $B$ and $c$ in $C$, this pulls back to define a functor $b \downarrow f \times_A u \downarrow c \to vb \downarrow gc.$
Theorem. A square $\array{A & \overset{f}{\to} & B\\ ^u\downarrow & \swArrow\alpha & \downarrow^v\\ C& \underset{g}{\to} & D}$ is exact if and only if each fiber of $b \downarrow f \times_A u \downarrow c \to v b \downarrow g c$ is nonempty and connected.
See the nLab for a proof. Similarly, the square is homotopy exact if and only if each fiber of this functor has a contractible nerve.
In the case of a Reedy square $\array{ iso(R) & \to & R^ \\ \downarrow & \swArrow id & \downarrow \\ R^+ & \to & R}$ these fibers are precisely the categories of Reedy factorizations of a fixed morphism. For an ordinary Reedy category $R$, Reedy factorizations are unique, and so the fibers are terminal categories. For a generalized Reedy category, Reedy factorizations are unique up to unique isomorphism, so the fibers are contractible groupoids.
For any category $M$, the objects in the lower righthand square $\array{ M^{iso(R)} & \leftarrow & M^{R^} \\ \uparrow & \swArrow id & \uparrow \\ M^{R^+} & \leftarrow & M^R}$ are Reedy diagrams in $M$, and the functors restrict to various subdiagrams. Because the indexing categories all have the same objects, if $M$ is bicomplete each of these restriction functors is both monadic and comonadic. If we think of the $M^{R^}$ as being comonadic over $M^{iso(R)}$ and $M^{R^+}$ as being monadic over $M^{iso(R)}$, then the BeckChevalley isomorphism exhibits $M^R$ as the category of bialgebras for the monad induced by the direct subcategory $R^+$ and the comonad induced by the inverse subcategory $R^$.
There is a homotopytheoretic interpretation of this, which I’ll describe in the case where $R$ is a strict Reedy category (so that $iso(R)=ob(R)$), though it works in the generalized context as well. If $M$ is a model category, then $M^{iso(R)}$ inherits a model structure, with everything defined objectwise. The Reedy model structure on $M^{R^}$ coincides with the injective model structure, which has cofibrations and weak equivalences created by the restriction functor $M^{R^} \to M^{iso(R)}$; we might say this model structure is “leftinduced”. Dually, the Reedy model structure on $M^{R^+}$ coincides with the projective model structure, which has fibrations and weak equivalences created by $M^{R^+} \to M^{iso(R)}$; this is “rightinduced”.
The Reedy model structure on $M^R$ then has two interpretations: it is rightinduced along the monadic restriction functor $M^R \to M^{R^}$ and it is leftinduced along the comonadic restriction functor $M^R \to M^{R^+}$. The paper A necessary and sufficient condition for induced model structures describes a general technique for inducing model structures on categories of bialgebras, which reproduces the Reedy model structure in this special case.
Way back when, for purposes of giving a talk, I made a figure that displayed the world of everyday experience in one equation. The label reflects the fact that the laws of physics underlying everyday life are completely understood.
So now there are Tshirts. (See below to purchase your own.)
It’s a good equation, representing the Feynman pathintegral formulation of an amplitude for going from one field configuration to another one, in the effective field theory consisting of Einstein’s general theory of relativity plus the Standard Model of particle physics. It even made it onto an extremely cool guitar.
I’m not quite up to doing a comprehensive post explaining every term in detail, but here’s the general idea. Our everyday world is welldescribed by an effective field theory. So the fundamental stuff of the world is a set of quantum fields that interact with each other. Feynman figured out that you could calculate the transition between two configurations of such fields by integrating over every possible trajectory between them — that’s what this equation represents. The thing being integrated is the exponential of the action for this theory — as mentioned, general relativity plus the Standard Model. The GR part integrates over the metric, which characterizes the geometry of spacetime; the matter fields are a bunch of fermions, the quarks and leptons; the nongravitational forces are gauge fields (photon, gluons, W and Z bosons); and of course the Higgs field breaks symmetry and gives mass to those fermions that deserve it. If none of that makes sense — maybe I’ll do it more carefully some other time.
Gravity is usually thought to be the odd force out when it comes to quantum mechanics, but that’s only if you really want a description of gravity that is valid everywhere, even at (for example) the Big Bang. But if you only want a theory that makes sense when gravity is weak, like here on Earth, there’s no problem at all. The little notation k < Λ at the bottom of the integral indicates that we only integrate over lowfrequency (longwavelength, lowenergy) vibrations in the relevant fields. (That's what gives away that this is an "effective" theory.) In that case there's no trouble including gravity. The fact that gravity is readily included in the EFT of everyday life has long been emphasized by Frank Wilczek. As discussed in his latest book, A Beautiful Question, he therefore advocates lumping GR together with the Standard Model and calling it The Core Theory.
I couldn’t agree more, so I adopted the same nomenclature for my own upcoming book, The Big Picture. There’s a whole chapter (more, really) in there about the Core Theory. After finishing those chapters, I rewarded myself by doing something I’ve been meaning to do for a long time — put the equation on a Tshirt, which you see above.
I’ve had Tshirts made before, with pretty grim results as far as quality is concerned. I knew this one would be especially tricky, what with all those tiny symbols. But I tried out DesignAShirt, and the result seems pretty impressively good.
So I’m happy to let anyone who might be interested go ahead and purchase shirts for themselves and their loved ones. Here are the links for light/dark and men’s/women’s versions. I don’t actually make any money off of this — you’re just buying a Tshirt from DesignAShirt. They’re a little pricey, but that’s what you get for the quality. I believe you can even edit colors and all that — feel free to give it a whirl and report back with your experiences.



