Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

August 25, 2024

Stirling’s Formula from Statistical Mechanics

Posted by John Baez

Physicists like to study all sorts of simplified situations, but here’s one I haven’t seen them discuss. I call it an ‘energy particle’. It’s an imaginary thing with no qualities except energy, which can be any number 0\ge 0.

I hate it when on Star Trek someone says “I’m detecting an energy field” — as if energy could exist without any specific form. That makes no sense! Yet here I am, talking about energy particles.

Earlier on the n-Café, I once outlined a simple proof of Stirling’s formula using Laplace’s method. When I started thinking about statistical mechanics, I got interested in an alternative proof using the Central Limit Theorem, mentioned in a comment by Mark Meckes. Now I want to dramatize that proof using energy particles.

Stirling’s formula says

N!2πN(Ne) N \displaystyle{ N! \sim \sqrt{2 \pi N} \, \left(\frac{N}{e}\right)^N }

where \sim means that the ratio of the two quantities goes to 11 as N.N \to \infty. Some proofs start with the observation that

N!= 0 x Ne xdx \displaystyle{ N! = \int_0^\infty x^N \, e^{-x} \, d x }

This says that N!N! is the Laplace transform of the function x Nx^N. Laplace transforms are important statistical mechanics. So what is this particular Laplace transform, and Stirling’s formula, telling us about statistical mechanics?

It turns out this Laplace transform shows up naturally when you consider a collection of energy particles!

Statistical mechanics says that at temperature TT, the probability for an energy particle to have energy EE follows an exponential distribution: at temperature TT it’s proportional to exp(E/kT)\exp(-E/k T) where kk is Boltzmann’s constant. From this you can show the expected energy of this particle is kTk T, and the standard deviation of its energy is also kTk T.

Next suppose you have NN energy particles at temperature TT, not interacting with each other. Each one acts as above and they’re all independent. As NN \to \infty, you can use the Central Limit Theorem to show the probability distribution of their total energy approaches a Gaussian with mean NkTN k T and standard deviation NkT\sqrt{N} k T. But you can also compute the probability distribution exactly from first principles, and you get an explicit formula for it. Comparing this to the Gaussian, you get Stirling’s formula!

In particular, the 2π\sqrt{2 \pi} that you see in a Gaussian gives the 2π\sqrt{2 \pi} in Stirling’s formula.

The math behind this argument is here, without any talk of physics:

The only problem is that it contains a bunch of rather dry calculations. If we use energy particles, these calculations have a physical meaning!

The downside to using energy particles is that you need to know some physics. So let me teach you that. If you know statistical mechanics well, you can probably skip this next section. If you don’t, it’s probably more important than anything else I’ll tell you today.

Classical statistical mechanics

When we combine classical mechanics with probability theory, we can use it to understand concepts like temperature and heat. This subject is called classical statistical mechanics. Here’s how I start explaining it to mathematicians.

A classical statistical mechanical system is a measure space (X,μ)(X,\mu) equipped with a measurable function

H:X[0,) H \colon X \to [0,\infty)

We call XX the state space, call points in XX the states of our system, and call HH its Hamiltonian: this assigns a nonnegative number called the energy H(x)H(x) to any state xXx \in X.

When our system is in thermal equilibrium at temperature TT we can ask: what’s the probability of the system’s state xx being in some subset of XX? To answer this question we need a probability measure on XX. We’ve already got a measure dμ(x)d\mu(x), but Boltzmann told us the chance of a system being in a state of energy EE is proportional to

e βE e^{- \beta E}

where β=1/kT\beta = 1/k T, where TT is temperature and kk is a physical constant called Boltzmann’s constant. So we should multiply dμ(x)d\mu(x) by e βH(x)e^{-\beta H(x)}. But then we need to normalize the result to get a probability measure. We get this:

e βH(x)dμ(x) Xe βH(x)dμ(x) \displaystyle{ \frac{ e^{-\beta H(x)} \; d\mu(x) }{ \int_X e^{-\beta H(x)} \; d\mu(x) } }

This is the so-called Gibbs measure. The normalization factor on bottom is called the partition function:

Z(β)= Xe βH(x)dμ(x) Z(\beta) = \displaystyle{ \int_X e^{-\beta H(x)} \; d\mu(x) }

and it winds up being very important. (I’ll assume the integral here converges, though it doesn’t always.)

In this setup we can figure out the probability distribution of energies that our system has at any temperature. I’ll just tell you the answer. For this it’s good to let

ν(E)=μ(({xX|H(x)E}) \nu(E) = \mu\left((\{x \in X \vert \; H(x) \le E \}\right)

be the measure of the set of states with energy E\le E. Very often

dν(E)=g(E)dE d\nu(E) = g(E) \, d E

for some integrable function g:g \colon \mathbb{R} \to \mathbb{R}. In other words,

g(E)=dν(E)dE g(E) = \frac{d \nu(E)}{d E}

where the expression at right is called a Radon–Nikodym derivative. Physicists call gg the density of states because if we integrate it over some interval (E,E+ΔE](E, E + \Delta E] we get ‘the number of states’ in that energy range. That’s how they say it, anyway. What we actually get is the measure of the set

{xX:E<H(x)E+ΔE} \{x \in X: \; E \lt H(x) \le E + \Delta E \}

We can express the partition function in terms of gg, and we get this:

Z(β)= 0 e βEg(E)dE Z(\beta) = \int_0^\infty e^{-\beta E} \, g(E) \; d E

So we say the partition function is the Laplace transform of the density of states.

We can also figure out the probability distribution of energies that our system has at any temperature, as promised. We get this function of EE:

e βEg(E)Z(β) \frac{e^{-\beta E} \, g(E)}{Z(\beta)}

This should make intuitive sense: we take the density of states and multiply it by e βEe^{-\beta E} following Boltzmann’s idea that the probability for the system to be in a state of energy EE decreases exponentially with EE in this way. To get a probability distribution, we then normalize this.

An energy particle

Now let’s apply statistical mechanics to a very simple system that I haven’t seen physicists discuss.

An energy particle is a hypothetical thing that only has energy, whose energy can be any nonnegative real number. So it’s a classical statistical mechanical system whose measure space is [0,)[0,\infty), and whose Hamiltonian is the identity function:

H(E)=E H(E) = E

What’s the measure on this measure space? It’s basically just Lebesgue measure. But the coordinate on this space, called EE, has units of energy, and we’d like the measure we use to be dimensionless, because physicists want the partition function to be dimensionless. So we won’t use dEd E as our measure; instead we’ll use

dEw \displaystyle{ \frac{d E}{w} }

where ww is some arbitrary unit of energy. The choice of ww won’t affect the probability distribution of energies, but it will affect other calculations I’ll do with energy particles in some later article.

We can answer some questions about energy particles using the stuff I explained in the last section:

1) What’s the density of states of an energy particle? It’s 1/w1/w.

2) What’s the partition function of an energy particle? It’s the integral of the density of states times e βEe^{-\beta E}, which is

0 e βEdEw=1βw \displaystyle{ \int_0^\infty e^{-\beta E} \; \frac{d E}{w} = \frac{1}{\beta w} }

3) What’s the probability distribution of energies of an energy particle? It’s the density of states times e βEe^{-\beta E} divided by the partition function, which gives the so-called exponential distribution:

βe βE \beta e^{-\beta E}

Notice how the quantity ww has canceled out in this calculation.

4) What’s the mean energy of an energy particle? It’s the mean of the above probability distribution, which is

1β=kT \frac{1}{\beta} = k T

5) What’s the variance of the energy of an energy particle? It’s

1β 2=(kT) 2 \frac{1}{\beta^2} = (k T)^2

Nice! We’re in math heaven here, where everything is simple and beautiful. We completely understand a single energy particle.

But I really want to understand a finite collection of energy particles. Luckily, there’s a symmetric monoidal category of classical statistical mechanical systems, so we can just tensor a bunch of individual energy particles. Whoops, that sounds like category theory! We wouldn’t want that — physicists might get scared. Let me try again.

The next section is well-known stuff, but also much more important than anything about ‘energy particles’.

Combining classical statistical mechanical systems

There’s standard way to combine two classical statistical mechanical systems and get a new one. To do this, we take the product of their underlying measure spaces and add their Hamiltonians. More precisely, suppose our systems are (X,μ,H)(X,\mu,H) and (X,μ,H)(X', \mu', H'). Then we form the product space X×XX \times X', give it the product measure μμ\mu \otimes \mu', and define the Hamiltonian HH:X×X[0,)H \otimes H' \colon X \times X' \to [0,\infty) by

(HH)(x,x)=H(x)+H(x) (H \otimes H')(x,x') = H(x) + H'(x')

We get a new system (X×X,μμ,HH)(X \times X', \mu \otimes \mu', H \otimes H').

When we combine systems in this way, a lot of nice things happen:

1) The density of states for the combined system is obtained by convolving the densities of states for the two separate systems.

2) The partition function of the combined system is the product of the partition functions of the two separate systems.

3) At any temperature, the probability distribution of energies of the combined system is obtained by convolving those of the two separate systems.

4) At any temperature, the mean energy of the combined system is the sum of the mean energies of the two separate systems.

5) At any temperature, the variance in energy of the combined system is the sum of the variances in energies of the two separate systems.

The last three of these follow from standard ideas on probability theory. For each value of β\beta, the rules of statistical mechanics give this probability measure on the state space of the combined system:

e β(H(x)+H (x ))dμ(x)dμ(x) X×X e β(H(x)+H (x ))dμ(x)dμ(x) \displaystyle{ \frac{ e^{-\beta (H(x) + H^'(x^'))} \; d\mu(x) \otimes d\mu'(x') }{ \int_{X \times X^'} e^{-\beta (H(x) + H^'(x^'))} \; d\mu(x) d\mu'(x')} }

But this is a product measure: it’s clearly the same as

e βH(x)dμ(x) Xe β(H(x)dμ(x)e βH (x )dμ(x) X e β(H (x )dμ(x) \displaystyle{ \frac{ e^{-\beta H(x)} \; d\mu(x) }{ \int_{X} e^{-\beta (H(x)} \; d\mu(x) } \otimes \frac{ e^{-\beta H^'(x^')} \; d\mu(x) }{ \int_{X^'} e^{-\beta (H^'(x^')} \; d\mu'(x') } }

Thus, in the language of probability theory, the energies of the two systems being combined are independent random variables. Whenever this happens, we convolve their probability distributions to get the probability distribution of their sum. The mean of their sum is the sum of their means. And the variance of their sum is the sum of their variances!

It’s also easy to see why the partition functions multiply. This just says

X×X e β(H(x)+H (x ))dμ(x)dμ(x)=( Xe β(H(x)dμ(x))( X e β(H (x )dμ(x)) \displaystyle{ \int_{X \times X^'} e^{-\beta (H(x) + H^'(x^'))} \; d\mu(x) d\mu'(x') = \left(\int_{X} e^{-\beta (H(x)} \; d\mu(x) \right) \left(\int_{X^'} e^{-\beta (H^'(x^')} \; d\mu'(x') \right) }

Finally, since the partition functions multiply, and the partition function is the Laplace transform of the density of states, the densities of states must convolve: the Laplace transform sends convolution to multiplication.

A system of NN energy particles

You can iterate the above arguments to understand what happens when you combine any number of classical statistical mechanical systems. For a system of NN energy particles the state space is [0,) N[0,\infty)^N, with Lebesgue measure as its measure. The energy is the sum of all their individual energies, so it’s

H(E 1,,E N)=E 1++E N H(E_1, \dots, E_N) = E_1 + \cdots + E_N

Let’s work out the density of states gg. We could do this by convolution but I prefer to do it from scratch — it amounts to the same thing. The measure of the set of states with energy E\le E is

ν(E)=μ({x[0,) N|H(x)E} \nu(E) = \mu(\{x \in [0,\infty)^N \vert \; H(x) \le E \}

This is just the Lebesgue measure of the simplex

{(E 1,,E N)|E i0andE 1++E NE} \{ (E_1, \dots, E_N) \vert \; E_i \ge 0 \; and \; E_1 + \cdots + E_N \le E \}

I hope you’re visualizing this simplex, for example when N=3N = 3:

                                               

Its volume is well known to be 1/N!1/N! times that of the hypercube [0,E] N[0,E]^N, but remember that our measure on the half-line is dE/wd E/w. So, we get

ν(E)=(E/w) NN! \nu(E) = \frac{(E/w)^N}{N!}

Differentiating this we get the density of states:

g(E)=dν(E)dE=1w(E/w) N1(N1)! g(E) = \frac{d\nu(E)}{d E} = \frac{1}{w} \frac{(E/w)^{N-1}}{(N-1)!}

So, the partition function of a collection of NN energy particles is

0 e βEg(E)dE= 0 e βE(E/w) N1(N1)!dEw=1(βw) N \int_0^\infty e^{-\beta E} \; g(E) \; d E = \int_0^\infty e^{-\beta E} \; \frac{(E/w)^{N-1}}{(N-1)!} \; \frac{d E}{w} = \frac{1}{(\beta w)^N}

In the last step I might have done the integral, or I might have used the fact that we already know the answer: it must be the NNth power of the partition function of a single energy particle!

You may wonder why we’re getting these factors of (N1)!(N-1)! when studying NN energy particles. If If you think about it, you’ll see why. The density of states is the derivative of the volume of this NN-simplex as a function of EE:

                                               

But that’s the area of the (N1)(N-1)-simplex shown in darker gray, which is proportional to 1/(N1)!1/(N-1)!.

Now we know everything we want:

1) What’s the density of states of NN energy particles?

1w(E/w) N1(N1)! \displaystyle{ \frac{1}{w} \frac{(E/w)^{N-1}}{(N-1)!} }

2) What’s the partition function of NN energy particles? It’s the integral of the density of states times e βEe^{-\beta E}, which is

0 e βE(E/w) N1(N1)!dEw=1(βw) N \displaystyle{ \int_0^\infty e^{-\beta E} \, \frac{(E/w)^{N-1}}{(N-1)!} \, \frac{d E}{w} = \frac{1}{(\beta w)^N} }

3) What’s the probability distribution of the total energy of NN energy particles? It’s the density of states times e βEe^{-\beta E} divided by the partition function, which gives the so-called gamma distribution:

β Ne βEE N1(N1)! \displaystyle{ \beta^N \, e^{-\beta E} \, \frac{E^{N-1}}{(N-1)!} }

4) What’s the mean total energy of NN energy particles? It’s

Nβ=NkT \frac{N}{\beta} = N k T

5) What’s the variance of the total energy of NN energy particles? It’s

Nβ 2=N(kT) 2 \frac{N}{\beta^2} = N (k T)^2

Now that we’re adding independent and identically distributed random variables, you can tell we are getting close to the Central Limit Theorem, which says that a sum of a bunch of those approaches a Gaussian — at least when each one has finite mean and variance, as holds here.

Stirling’s formula

What is the probability distribution of energies of a system made of NN energy particles? We’ve used statistical mechanics to show that it’s

β Ne βEE N1(N1)! \displaystyle{ \beta^N \, e^{-\beta E} \, \frac{E^{N-1}}{(N-1)!} }

But the energy of each particle has mean 1/β1/\beta and variance 1/β 21/\beta^2, and these energies are independent random variables. So the Central Limit Theorem says their sum is asymptotic to a Gaussian with mean N/βN/\beta and variance N/β 2N/\beta^2, namely

12πN/β 2e (EN/β) 2/(2N/β 2) \displaystyle{ \frac{1}{\sqrt{2 \pi N /\beta^2}} e^{-(E - N/\beta)^2/(2N / \beta^2)} }

We obtain a complicated-looking asymptotic formula:

β Ne βEE N1(N1)!12πN/β 2e (EN/β) 2/(2N/β 2) \displaystyle{ \beta^N \, e^{-\beta E} \, \frac{E^{N-1}}{(N-1)!} \; \sim \; \frac{1}{\sqrt{2 \pi N /\beta^2}} e^{-(E - N/\beta)^2/(2N / \beta^2)} }

But if we simplify this by taking β=1\beta = 1, we get

e EE N1(N1)!12πNe (EN) 2/2N \displaystyle{ e^{- E} \, \frac{E^{N-1}}{(N-1)!} \sim \frac{1}{\sqrt{2 \pi N}} e^{-(E - N)^2/2N} }

and if we then take E=NE = N, we get

e Nn N1(N1)!12πN\displaystyle{ e^{-N} \, \frac{n^{N-1}}{(N-1)!} \sim \frac{1}{\sqrt{2 \pi N}} }

Fiddling around a bit we get Stirling’s formula:

N!2πN(Ne) N \displaystyle{ N! \sim \sqrt{2 \pi N} \, \left(\frac{N}{e}\right)^N }

Unfortunately this argument isn’t rigorous yet: I’m acting like the Central Limit Theorem implies pointwise convergence of probability distributions to a Gaussian, but it’s not that strong. So we need to work a bit harder. For the details I leave you to Aditya Ghosh’s article.

But my goal here was not really to dot every i and cross every t. It was to show that Stirling’s formula emerges naturally from applying standard ideas in statistical mechanics and the Central Limit Theorem to large collections of identical systems of a particularly simple sort.

Posted at August 25, 2024 10:29 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3555

7 Comments & 2 Trackbacks

Re: Stirling’s Formula from Statistical Mechanics

Let XX and YY be classical statistical mechanical systems, and let XYX \otimes Y be the tensor of two such systems. Given a classical statistical mechanical system AA, does the category of classical statistical mechanical systems have a terminal coalgebra of the endofunctor XAXX \mapsto A \otimes X?

Posted by: Madeleine Birchfield on August 27, 2024 11:02 AM | Permalink | Reply to this

Re: Stirling’s Formula from Statistical Mechanics

I don’t know! And I don’t have much intuition for this. Since analysis makes things harder, maybe someone can answer an easier question. Suppose RR is an abelian group, make sets over RR into a category in the usual way (which doesn’t use the group structure), and make this category symmmetric monoidal hinted at in this post, where the tensor product of H:XRH: X\to R and H:XRH':X\to R is HH:X×XRH\otimes H' : X \times X' \to R where

(HH)(x,x)=H(x)+H(x)(H\otimes H')(x,x') = H(x) + H(x')

Does tensoring with an objext H:XRH:X \to R have a final coalgebra, and if so what is it? Ditto for initial algebra.

Posted by: John Baez on August 29, 2024 9:35 PM | Permalink | Reply to this

Re: Stirling’s Formula from Statistical Mechanics

In the high temperature limit the rotational partition function of a linear diatomic molecule should be amusing to you.

Posted by: Steve Huntsman on August 27, 2024 2:13 PM | Permalink | Reply to this

Re: Stirling’s Formula from Statistical Mechanics

Very nice! It’s pleasing to find a real-world interpretation to motivate otherwise obscure calculations.

Regarding the point at the end, about how the standard Central Limit Theorem doesn’t imply pointwise convergence of densities, the keyword for the kind of result you need is “local limit theorem”. (Here’s a brief introduction, though I haven’t checked whether the continuous local limit theorem stated there is strong enough for the purposes of this post.)

Posted by: Mark Meckes on August 27, 2024 2:43 PM | Permalink | Reply to this

Re: Stirling’s Formula from Statistical Mechanics

Thanks! It turns out I’ll be needing to improve my knowledge of such refined central limit theorems in order to study the statistical mechanics of large collections of identical systems for other reasons: these ‘energy particles’ are just an exactly solvable test case of some general questions I’m interested in. More on those soon, I hope! So I may come back begging for various specific results.

Posted by: John Baez on August 29, 2024 8:12 PM | Permalink | Reply to this

Re: Stirling’s Formula from Statistical Mechanics

I rewrote some of this post to make it dimensionally correct — that is, to get the units to work out. It didn’t affect any of the main results here, but I want to use the calculations here in “The space of physical frameworks (part 2)”, where dimensional analysis matters more. I decided I don’t want to have to redo all those calculations there!

The correction involves the measure I’m using on the space of states of an energy particle, [0,)[0,\infty). I’d been using Lebesgue measure. But the coordinate on this space, called EE, has units of energy, and we’d like the measure we use to be dimensionless, because physicists want the partition function to be dimensionless. So instead of using dEd E as our measure, I want to use

dEw \displaystyle{ \frac{d E}{w} }

where ww is some arbitrary unit of energy. The choice of ww won’t affect the probability distribution of energies, but it will affect other calculations later.

The appearance of this funny constant ww is very similar to how Planck’s constant hh shows up in the calculation of the entropy of a classical harmonic oscillator, which I explained starting on page 55 of my book What is Entropy?. The point there was that the measure dpdqd p \, d q on phase space has dimensions of action, while dpdq/hd p \, d q / h is dimensionless and thus better.

Posted by: John Baez on September 3, 2024 1:33 AM | Permalink | Reply to this
Read the post The Space of Physical Frameworks (Part 2)
Weblog: The n-Category Café
Excerpt: What's a thermostatic system, and what are some basic things we can do with them?
Tracked: September 8, 2024 12:22 AM
Read the post The Space of Physical Frameworks (Part 5)
Weblog: The n-Category Café
Excerpt: Let's think about how classical statistical mechanics reduces to thermodynamics in the limit where Boltzmann's constant \(k\) approaches zero, by looking at an example.
Tracked: October 2, 2024 4:15 AM

Re: Stirling’s Formula from Statistical Mechanics

I haven’t looked closely at it yet, but here’s an nice-looking (and strikingly titled) preprint you might find interesting in connection with all this: Probability Proofs for Stirling (and More): the Ubiquitous Role of 2π\sqrt{2\pi}.

Posted by: Mark Meckes on October 28, 2024 4:59 PM | Permalink | Reply to this

Post a New Comment