### How the Simplex is a Vector Space

#### Posted by Tom Leinster

It’s an underappreciated fact that the interior of every simplex $\Delta^n$ is a real vector space in a natural way. For instance, here’s the 2-simplex with twelve of its 1-dimensional linear subspaces drawn in:

(That’s just a sketch. See below for an accurate diagram by Greg Egan.)

In this post, I’ll explain what this vector space structure is and why everyone who’s ever taken a course on thermodynamics knows about it, at least partially, even if they don’t know they do.

Let’s begin with the most ordinary vector space of all, $\mathbb{R}^n$. (By “vector space” I’ll always mean vector space over $\mathbb{R}$.) There’s a bijection

$\mathbb{R} \leftrightarrow (0, \infty)$

between the real line and the positive half-line, given by exponential in one direction and log in the other. Doing this bijection in each coordinate gives a bijection

$\mathbb{R}^n \leftrightarrow (0, \infty)^n.$

So, if we transport the vector space structure of $\mathbb{R}^n$ along this bijection, we’ll produce a vector space structure on $(0, \infty)^n$. This new vector space $(0, \infty)^n$ is isomorphic to $\mathbb{R}^n$, by definition.

Explicitly, the “addition” of the vector space $(0, \infty)^n$ is coordinatewise multiplication, the “zero” vector is $(1, \ldots, 1)$, and “subtraction” is coordinatewise division. The scalar “multiplication” is given by powers: multiplying a vector $\mathbf{y} = (y_1, \ldots, y_n) \in (0, \infty)^n$ by a scalar $\lambda \in \mathbb{R}$ gives $(y_1^\lambda, \ldots, y_n^\lambda)$.

Now, the ordinary vector space $\mathbb{R}^n$ has a linear subspace $U$ spanned by $(1, \ldots, 1)$. That is,

$U = \{(\lambda, \ldots, \lambda) \colon \lambda \in \mathbb{R} \}.$

Since the vector spaces $\mathbb{R}^n$ and $(0, \infty)^n$ are isomorphic, there’s a corresponding subspace $W$ of $(0, \infty)^n$, and it’s given by

$W = \{(e^\lambda, \ldots, e^\lambda) \colon \lambda \in \mathbb{R} \} = \{(\gamma, \ldots, \gamma) \colon \gamma \in (0, \infty)\}.$

But whenever we have a linear subspace of a vector space, we can form the quotient. Let’s do this with the subspace $W$ of $(0, \infty)^n$. What does the quotient $(0, \infty)^n/W$ look like?

Well, two vectors $\mathbf{y}, \mathbf{z} \in (0, \infty)^n$ represent the same element of $(0, \infty)^n/W$ if and only if their “difference” — in the vector space sense — belongs to $W$. Since “difference” or “subtraction” in the vector space $(0, \infty)^n$ is coordinatewise division, this just means that

$\frac{y_1}{z_1} = \frac{y_2}{z_2} = \cdots = \frac{y_n}{z_n}.$

So, the elements of $(0, \infty)^n/W$ are the equivalence classes of $n$-tuples of positive reals, with two tuples considered equivalent if they’re the same up to rescaling.

Now here’s the crucial part: it’s natural to normalize everything to sum to $1$. In other words, in each equivalence class, we single out the unique tuple $(y_1, \ldots, y_n)$ such that $y_1 + \cdots + y_n = 1$. This gives a bijection

$(0, \infty)^n/W \leftrightarrow \Delta_n^\circ$

where $\Delta_n^\circ$ is the interior of the $(n - 1)$-simplex:

$\Delta_n^\circ = \{(p_1, \ldots, p_n) \colon p_i \gt 0, \sum p_i = 1 \}.$

You can think of $\Delta_n^\circ$ as the set of probability distributions on an $n$-element set that satisfy Cromwell’s rule: zero probabilities are forbidden. (Or as Cromwell put it, “I beseech you, in the bowels of Christ, think it possible that you may be mistaken.”)

Transporting the vector space structure of $(0, \infty)^n/W$ along this bijection gives a vector space structure to $\Delta_n^\circ$. And that’s the vector space structure on the simplex.

So what are these vector space operations on the simplex, in concrete terms? They’re given by the same operations in $(0, \infty)^n$, followed by normalization. So, the “sum” of two probability distributions $\mathbf{p}$ and $\mathbf{q}$ is

$\frac{(p_1 q_1, p_2 q_2, \ldots, p_n q_n)}{p_1 q_1 + p_2 q_2 + \cdots + p_n q_n},$

the “zero” vector is the uniform distribution

$\frac{(1, 1, \ldots, 1)}{1 + 1 + \cdots + 1} = (1/n, 1/n, \ldots, 1/n),$

and “multiplying” a probability distribution $\mathbf{p}$ by a scalar $\lambda \in \mathbb{R}$ gives

$\frac{(p_1^\lambda, p_2^\lambda, \ldots, p_n^\lambda)}{p_1^\lambda + p_2^\lambda + \cdots + p_n^\lambda}.$

For instance, let’s think about the scalar “multiples” of

$\mathbf{p} = (0.2, 0.3, 0.5) \in \Delta_3.$

“Multiplying” $\mathbf{p}$ by $\lambda \in \mathbb{R}$ gives

$\frac{(0.2^\lambda, 0.3^\lambda, 0.5^\lambda)}{0.2^\lambda + 0.3^\lambda + 0.5^\lambda}$

which I’ll call $\mathbf{p}^{(\lambda)}$, to avoid the confusion that would be created by calling it $\lambda\mathbf{p}$.

When $\lambda = 0$, $\mathbf{p}^{(\lambda)}$ is just the uniform distribution $(1/3, 1/3, 1/3)$ — which of course it has to be, since multiplying any vector by the scalar $0$ has to give the zero vector.

For equally obvious reasons, $\mathbf{p}^{(1)}$ has to be just $\mathbf{p}$.

When $\lambda$ is large and positive, the powers of $0.5$ dominate over the powers of the smaller numbers $0.2$ and $0.3$, so $\mathbf{p}^{(\lambda)} \to (0, 0, 1)$ as $\lambda \to \infty$.

For similar reasons, $\mathbf{p}^{(\lambda)} \to (1, 0, 0)$ as $\lambda \to -\infty$. This behaviour as $\lambda \to \pm\infty$ is the reason why, in the picture above, you see the curves curling in at the ends towards the triangle’s corners.

Some physicists refer to the distributions $\mathbf{p}^{(\lambda)}$ as the “escort
distributions” of $\mathbf{p}$. And in fact, the scalar multiplication of
the vector space structure on the simplex is a key part of the solution of
a very basic problem in thermodynamics — so basic that even *I* know
it.

The problem goes like this. First I’ll state it using the notation above, then afterwards I’ll translate it back into terms that physicists usually use.

Fix $\xi_1, \ldots, \xi_n, \xi \gt 0$. Among all probability distributions $(p_1, \ldots, p_n)$ satisfying the constraint

$\xi_1^{p_1} \xi_2^{p_2} \cdots \xi_n^{p_n} = \xi,$

which one minimizes the quantity

$p_1^{p_1} p_2^{p_2} \cdots p_n^{p_n}?$

It makes no difference to this question if $\xi_1, \ldots, \xi_n, \xi$ are normalized so that $\xi_1 + \cdots + \xi_n = 1$ (since multiplying each of $\xi_1, \ldots, \xi_n, \xi$ by a constant doesn’t change the constraint). So, let’s assume this has been done.

Then the answer to the question turns out to be: the minimizing distribution $\mathbf{p}$ is a scalar multiple of $(\xi_1, \ldots, \xi_n)$ in the vector space structure on the simplex. In other words, it’s an escort distribution of $(\xi_1, \ldots, \xi_n)$. Or in other words still, it’s an element of the linear subspace of $\Delta_n^\circ$ spanned by $(\xi_1, \ldots, \xi_n)$. Which one? The unique one such that the constraint is satisfied.

Proving that this is the answer is a simple exercise in calculus, e.g. using Lagrange multipliers.

For instance, take $(\xi_1, \xi_2, \xi_3) = (0.2, 0.3, 0.5)$ and $\xi = 0.4$. Among all distributions $(p_1, p_2, p_3)$ that satisfy the constraint

$0.2^{p_1} \times 0.3^{p_2} \times 0.5^{p_3} = 0.4,$

the one that minimizes $p_1^{p_1} p_2^{p_2} p_3^{p_3}$ is some escort distribution of $(0.2, 0.3, 0.5)$. Maybe one of the curves shown in the picture above is the 1-dimensional subspace spanned by $(0.2, 0.3, 0.5)$, and in that case, the $\mathbf{p}$ that minimizes is somewhere on that curve.

The location of $\mathbf{p}$ on that curve depends on the value of $\xi$, which here I chose to be $0.4$. If I changed it to $0.20001$ or $0.49999$ then $\mathbf{p}$ would be nearly at one end or the other of the curve, since $(0.2, 0.3, 0.5)^{(\lambda)}$ converges to $0.2$ as $\lambda \to -\infty$ and to $0.5$ as $\lambda \to \infty$.

AsideI’m glossing over the question of existence and uniqueness of solutions to the optimization question. Since $\xi_1^{p_1} \xi_2^{p_2} \cdots \xi_n^{p_n}$ is a kind of average of $\xi_1, \xi_2, \ldots, \xi_n$ — a weighted, geometric mean — there’s no solution at all unless $\min_i \xi_i \leq \xi \leq \max_i \xi_i$. As long as that inequality is satisfied, there’s a minimizing $\mathbf{p}$, although it’s not always unique: e.g. consider what happens when all the $\xi_i$s are equal.

Physicists prefer to do all this in logarithmic form. So, rather than start with $\xi_1, \ldots, \xi_n, \xi \gt 0$, they start with $x_1, \ldots, x_n, x \in \mathbb{R}$; think of this as substituting $x_i = -\log \xi_i$ and $x = -\log \xi$. So, the constraint

$\xi_1^{p_1} \xi_2^{p_2} \cdots \xi_n^{p_n} = \xi$

becomes

$e^{-p_1 \xi_1} e^{-p_2 \xi_2} \cdots e^{-p_n \xi_n} = e^{-x}$

or equivalently

$p_1 x_1 + p_2 x_2 + \cdots + p_n x_n = x.$

We’re trying to minimize $p_1^{p_1} p_2^{p_2} \cdots p_n^{p_n}$ subject to that constraint, and again the physicists prefer the logarithmic form (with a change of sign): maximize

$-(p_1 \log p_1 + p_2 \log p_2 + \cdots + p_n \log p_n).$

That quantity is the **Shannon entropy** of the distribution $(p_1, \ldots,
p_n)$: so we’re looking for the maximum entropy solution to the constraint.
This is called the **Gibbs state**, and as we saw, it’s a
scalar multiple of $(\xi_1, \ldots, \xi_n)$ in the vector space structure
on the simplex. Equivalently, it’s

$\frac{(e^{-\lambda x_1}, e^{-\lambda x_2}, \ldots, e^{-\lambda x_n})}{e^{-\lambda x_1} + e^{-\lambda x_2} + \cdots + e^{-\lambda x_n}}$

for whichever value of $\lambda$ satisfies the constraint. The denominator
here is the famous **partition function**.

So, that basic thermodynamic problem is (implicitly) solved by scalar multiplication in the vector space structure on the simplex. A question: does addition in the vector space structure on the simplex also have a role to play in physics?

## Re: How the Simplex is a Vector Space

One could wonder whether the vector space construction in simplices can relate to Eduoard Lucas’ observation that 1 and 24 are the only two numbers that satisfy the pyramidal square stacking of his Diophantine equation of the so called ‘cannonball problem’. The norm zero Weyl vector from this relation is used in a construction of the Leech lattice as mentioned in Conway and Sloane’s ‘Lorentzian Forms for the Leech Lattice’.This is stated ,”We assert that the Leech lattice can be regarded as the set of all vectors of L orthogonal to w = (0,1,2,…23,24:70)”. The pyramidal square number 70^2 = 4900 can also be expressed as 140 tetrahedral units having 35 elements each or 35 squared pyramidal units having 140 elements each.