## February 28, 2017

### Functional Equations IV: A Simple Characterization of Relative Entropy

#### Posted by Tom Leinster

Relative entropy appears in mathematics, physics, engineering, and statistics under an unhelpful variety of names: relative information, information gain, information divergence, Kullback-Leibler divergence, and so on and on. This hints at how important it is.

In the functional equations course that I’m currently delivering, I stated and proved a theorem today that characterizes relative entropy uniquely:

TheoremRelative entropy is essentially the only function that satisfies three trivial requirements and the chain rule.

You can find this on pages 15–19 of the course notes so far, including an explanation of the “chain rule” in terms of Swiss and Canadian French.

I don’t know who this theorem is due to. I came up with it myself, I’m not aware of its existence in the literature, and it didn’t appear on this list until I added it. **However**, it could have been proved any time since the 1950s, and I bet someone’s done it before.

The proof I gave owes a great deal to the categorical characterization of relative entropy by John Baez and Tobias Fritz, which John blogged about here before.

## February 26, 2017

### Variance in a Metric Space

#### Posted by Tom Leinster

Here’s a small observation for a Sunday afternoon. When you have a random
variable in $\mathbb{R}^n$ or some other similar space, you can take its
expected value. For a random variable $X$ in an arbitrary metric space,
you can’t take the expected value — it simply doesn’t make sense. *(Edited to add: maybe that’s misleading. See the comments!)* But
you *can* take its variance, and the right definition is

$Var(X) = \tfrac{1}{2} \mathbb{E}(d(X_1, X_2)^2),$

where $X_1$ and $X_2$ are independent and distributed identically to $X$.

## February 22, 2017

### Functional Equations III: Explaining Relative Entropy

#### Posted by Tom Leinster

Much of this functional equations course is about entropy and its cousins, such as means, norms, and measures of diversity. So I thought it worth spending one session talking purely about ways of *understanding* entropy, without actually proving anything about it. I wanted especially to explain how to think about relative entropy — also known as relative information, information gain, and Kullback-Leibler divergence.

My strategy was to do this via coding theory. Information is a slippery concept, and reasoning about it takes some practice. But putting everything in the framework of coding makes everything more concrete. The central point is:

The entropy of a distribution is the mean number of bits per symbols in an optimal encoding.

All this and more is in the course notes. The part we did today starts on page 11.

*Next week: relative entropy is the only quantity that satisfies a couple of reasonable properties.*

## February 18, 2017

### Distributive Laws

#### Posted by Emily Riehl

*Guest post by Liang Ze Wong*

The Kan Extension Seminar II continues and this week, we discuss Jon Beck’s “Distributive Laws”, which was published in 1969 in the proceedings of the Seminar on Triples and Categorical Homology Theory, LNM vol 80. In the previous Kan seminar post, Evangelia described the relationship between Lawvere theories and finitary monads, along with two ways of combining them (the sum and tensor) that are very natural for Lawvere theories but less so for monads. Distributive laws give us a way of *composing* monads to get another monad, and are more natural from the monad point of view.

Beck’s paper starts by defining and characterizing distributive laws. He then describes the category of algebras of the composite monad. Just as monads can be factored into adjunctions, he next shows how distributive laws between monads can be “factored” into a “distributive square” of adjunctions. Finally, he ends off with a series of examples.

Before we dive into the paper, I would like to thank Emily Riehl, Alexander Campbell and Brendan Fong for allowing me to be a part of this seminar, and the other participants for their wonderful virtual company. I would also like to thank my advisor James Zhang and his group for their insightful and encouraging comments as I was preparing for this seminar.

## February 14, 2017

### Functional Equations II: Shannon Entropy

#### Posted by Tom Leinster

In the second instalment of the functional equations course that I’m teaching, I introduced Shannon entropy. I also showed that up to a constant factor, it’s uniquely characterized by a functional equation that it satisfies: the chain rule.

Notes for the course so far are here. For a quick summary of today’s session, read on.

## February 13, 2017

### M-theory from the Superpoint

#### Posted by David Corfield

You may have been following the ‘Division algebra and supersymmetry’ story, the last instalment of which appeared a while ago under the title M-theory, Octonions and Tricategories. John (Baez) was telling us of some work by his former student John Huerta which relates these entities. The post ends with a declaration which does not suffer from comparison to Prospero’s in *The Tempest*

But this rough magic

I here abjure. And when I have required

Some heavenly music – which even now I do –

To work mine end upon their senses that

This airy charm is for, I’ll break my staff,

Bury it certain fathoms in the earth,

And deeper than did ever plummet sound

I’ll drown my book.

## February 10, 2017

### The Heilbronn Institute and the University of Bristol

#### Posted by Tom Leinster

The Heilbronn Institute is the mathematical brand of the UK intelligence and spying agency GCHQ (Government Communications Headquarters). GCHQ is one of the country’s largest employers of mathematicians. And the Heilbronn Institute is now claiming to be the largest funder of “pure mathematics” in the country, largely through its many research fellowships at Bristol (where it’s based) and London.

In 2013, Edward Snowden leaked a massive archive of documents that shone a light on the hidden activities of GCHQ and its close partner, the US National Security Agency (NSA), including whole-population surveillance and deliberate stifling of peaceful activism. Much of this was carried out without the permission — or even knowledge — of the politicians who supposedly oversee them.

All this should obviously concern any mathematician with a soul, as I’ve argued. These are our major employers and funders. But you might wonder about the close-up picture. How do spy agencies such as GCHQ and the NSA work their way into academic culture? What do they do to ensure a continuing supply of mathematicians to employ, despite the suspicion with which most of us view them?

Alon Aviram of the *Bristol Cable* has just published an article on this,
describing specific connections between GCHQ/Heilbronn and the University of
Bristol — and, more broadly, academic mathematicians and computer
scientists:

Alon Aviram, Bristol University working with the surveillance state.

The Bristol Cable, 7 February 2017.

It includes some quotes from me and from legendary computer-security scientist Ross Anderson, as well as some nuggets from a long leaked Heilbronn “problem book” that’s interesting in its own right.

## February 7, 2017

### Functional Equations I: Cauchy’s Equation

#### Posted by Tom Leinster

This semester, I’m teaching a seminar course on functional equations. Why? Among other reasons:

Because I’m interested in measures of biological diversity. Dozens (or even hundreds?) of diversity measures have been proposed, but it would be a big step forward to have theorems of the form: “If you want your measure to have

*this*property,*this*property, and*this*property, then it must be*that*measure. No other will do.”Because teaching a course on functional equations will force me to learn about functional equations.

Because it touches on lots of mathematically interesting topics, such as entropy of various kinds and the theory of large deviations.

Today was a warm-up, focusing on Cauchy’s functional equation: which functions $f: \mathbb{R} \to \mathbb{R}$ satisfy

$f(x + y) = f(x) + f(y) \,\,\,\, \forall x, y \in \mathbb{R}?$

(I wrote about this equation before when I discovered that one of the main references is in Esperanto.) Later classes will look at entropy, means, norms, diversity measures, and a newish probabilistic method for solving functional equations.

Read on for today’s notes and an outline of the whole course.

### The Category Theoretic Understanding of Universal Algebra

#### Posted by Emily Riehl

*Guest post by Evangelia Aleiferi*

We begin the second series of the Kan Extension Seminar by discussing the paper The Category Theoretic Understanding of Universal Algebra: Lawvere Theories and Monads by Martin Hyland and John Power, published in 2007. The subject of the above is to give a historical survey on the two main category theoretic formulations of universal algebra, the first being Lawvere theories and the second the theory of monads. Lawvere theories were introduced by William Lawvere in 1963 as part of his doctoral thesis, in order to provide an elegant categorical base for studying universal algebra, while monads were structures that had been already used in different areas of mathematics.

The article starts with the definition of a Lawvere theory and the category of its models, together with some of their properties. Later the authors proceed to relate the notion of monad to Lawvere theories and they give an explanation as to why they were dominantly used in the understanding of universal algebra, compared to Lawvere theories. Lastly they describe how monads and Lawvere theories can be used in formulating computational effects, motivated by the work of Moggi and Plotkin, and they propose future developments based on the connection between computational effects and universal algebra.

At this point, I would like to take the opportunity to thank Emily Riehl, Alexander Campbell and Brendan Fong for organizing the Kan Extension Seminar, as well as all the other participants for being such a great motivation!