### A Semigroup Approach to Finite Markov Chains

#### Posted by Tom Leinster

*Guest post by
Benjamin
Steinberg*

*A finite Markov chain is a very simple kind of
transition system:*

*It can be encoded as a square matrix of nonnegative reals, each of
whose rows sums to 1 (a ‘stochastic matrix’). Here you’ll meet
the fundamental results on finite Markov chains, proved in a way that
appears to be new. — TL*

I’m going to prove all the standard finite Markov chain convergence results using the theory of compact semigroups. This approach may be known, but I couldn’t find it. I don’t obtain any of the bounds on convergence, although I suppose one can obtain them via this method with a little work.

### Compact semigroups

By a compact semigroup we mean a non-empty semigroup $S$ with a compact Hausdorff topology and jointly continuous multiplication $S\times S\to S$. We shall assume for simplicity that the topology on $S$ is metric since our main examples have this property. The following simple exercise from topology will be used without comment in the sequel.

Lemma 1Let $(X,d)$ be a compact metric space and $\{x_n\}$ a sequence in $X$. Then $x_n$ converges to $x\in X$ if and only if every convergent subsequence of $\{x_n\}$ converges to $x$.

For example, it is easy to prove from this lemma that if $S$ is a compact semigroup which is algebraically a group, then the inverse is continuous. Indeed, if $s_n\to s$ and $\tilde{s}$ is a limit of some subsequence of $\{s_n^{-1}\}$, then $\tilde{s}s=1=s\tilde{s}$ and so $\tilde{s}=s^{-1}$. By the lemma, it follows that $s_n^{-1}\to s^{-1}$.

A compact semigroup is called **monothetic** if it is topologically
generated by a single element, that is, $S=\overline{\langle s\rangle}$ for
some
$s\in S$ where, as usual, $\langle s\rangle =\{s^n\mid n\geq 1\}$ is the
subsemigroup generated by $s$. One easily checks that $S$ is commutative. In
fact, the structure of $S$ is well known and we record it in the following
theorem, due independently to Koch and Numakura without the metric assumption.

Theorem 2 (Koch, Numakura)Let $S=\overline{\langle s\rangle}$ be a monothetic compact (metric) semigroup. Then $S$ has a unique minimal ideal $I$, namely the set of all limits of convergent subsequences of $\{s^n\}$. Moreover, $I$ is a compact abelian group. The identity $s^{\omega}$ of $I$ is the unique idempotent of $S$. Furthermore, $I=s^{\omega}S$ and the map $x\mapsto s^{\omega}x$ is a surjective continuous homomorphism $S\to I$ and so in particular $I$ is a monothetic compact abelian group.

Proof:The sets $s^n S$ with $n\geq 1$ form a decreasing chain of closed ideals and hence have non-empty intersection by compactness. Put $I=\bigcap_{n\geq 1} s^n S$; clearly $I$ is a closed ideal. We claim it consists precisely of all limits of convergent subsequences of $\{s^n\}$.Indeed, suppose that $x=\lim s^{n_k}$ and $n\geq 1$. Then by passing to a subsequence we may assume that $n_k > n$ for all $k$ and $\{s^{n_k-n}\}$ converges to an element $y\in S$. Then $x=s^n y$ by continuity of multiplication. We conclude that $x\in I$. For the converse, first observe that every element of $S\setminus \{s^k\mid k\geq 1\}$ is a limit of a subsequence of $\{s^n\}$. Thus it suffices to prove that if $s^m\in I$ for some $m\geq 1$, then $s^m$ is a limit of a convergent subsequence of $\{s^n\}$. But then we have $s^m=s^m y$ for some $y\in S$. If $y=\lim s^{n_k}$ with $n_k$ an increasing sequence, then $s^m=\lim s^{m+n_k}$. Otherwise, $s^m=s^{m+k}$ for some $k\geq 1$. But then $s^{m+n k}=s$ for all $n\geq 1$ and so $s^{m+n k}\to s^m$. Thus $I$ consists of all limits of convergent subsequences of $\{s^n\}$.

We next show that $I$ is algebraically a group. It will then follow that it is a compact abelian group. To do this, it suffice to show (by commutativity) that $a x=b$ has a solution $x\in I$ for all $a,b\in I$. Suppose $a=\lim s^{n_k}$ and $b=\lim s^{m_k}$. Passing to a subsequence, we may assume that $m_k > n_k$ for all $k$. Extracting further subsequences, we may assume $s^{m_k-n_k}$ converges to an element $x\in S$. Then $a x=\lim s^{n_k} s^{m_k-n_k} = \lim s^{m_k} = b.$ This completes the proof that $I$ is a group.

Let us show that $I$ is the unique minimal ideal. Let $J$ be an ideal of $S$. Then $\emptyset\neq I J\subseteq I\cap J\subseteq J$. But $I$ is a group and so has no proper ideals. Thus $I J=I$ and so $I\subseteq J$.

Suppose that $f\in S$ is an idempotent. From $f=\lim f^n$, it is immediate that $f$ is a limit of a subsequence of $\{s^n\}$ and hence belongs to $I$. But $I$ is a group and therefore contains a unique idempotent, its identity. Clearly $s^{\omega} S \subseteq I$ and so $s^{\omega}S=I$ by minimality. For $x,y\in S$, one has $s^{\omega} x s^{\omega} y = (s^{\omega})^2 x y = s^{\omega} x y$ and so $x\mapsto s^{\omega}x$ is a continuous surjective homomorphism $S\to s^{\omega}S=I$. This completes the proof.

Let $S=\overline{\langle s\rangle}$ be a compact monothetic semigroup (which we
continue to assume to be metric) and let $S\times X\to X$ be a faithful
continuous left action of $S$ on a compact Hausdorff space $X$. Then
$s^{\omega}X$ is a closed $S$-invariant subspace, known as the **eventual
range** of $s$, and $S$ acts on $s^{\omega}X$ by homeomorphisms. Moreover, the
action of $S$ on $s^{\omega}X$ factors through the homomorphism $S\to
s^{\omega}S$ from Theorem 2.

Notice that the fixed-point set of $s$ is contained in the eventual range. Indeed, $s x=x$ implies $s^n x=x$ for all $n\geq 1$ and so $s^{\omega}x=x$. It follows that if the eventual range of $s$ is a singleton, that is, if $s^{\omega}$ acts as a constant map, then $s$ has a unique fixed-point. Indeed, if $s^{\omega}X=\{x_0\}$, then since the eventual range of $s$ is $S$-invariant, clearly $s x_0=x_0$. But the above discussion shows that each fixed point of $s$ belongs to the eventual range and so $x_0$ is the unique fixed-point of $S$. In this case, $s^{\omega}S=\{s^{\omega}\}$ (by faithfulness of the action) and hence $s^n\to s^{\omega}$ in light of Theorem 2 and Lemma 1. We summarize this discussion in the following proposition.

Proposition 3Let $S=\overline {\langle s\rangle}$ be a monothetic compact semigroup acting faithfully on the left on a compact Hausdorff space $X$ and suppose that $s^{\omega}$ acts as a constant map with image $x_0$, i.e., $\{x_0\}$ is the eventual range of $s$. Then $x_0$ is the unique fixed-point of $s$ on $X$, $s^n\to s^{\omega}$ (uniformly) and $s^n x\to x_0$ for all $x\in X$. The dual statement holds for right actions.

As a warmup exercise let us give a quick proof of the
Banach
fixed point theorem
for compact metric spaces. Let $(X,d)$ be a compact metric space. A
mapping $f\colon X\to X$ is a **weak contraction** if $d(f(x),f(y))\leq
d(x,y)$ for all $x,y\in X$. If the inequality is strict for $x\neq y$, then
$f$ is said to be a (strict) contraction. The weak contractions form a
submonoid $WCtr(X)$ of the monoid of continuous self-maps of
$X$, which is easily verified to be closed in the compact-open topology (and
hence is metric via the sup norm). Since the weak contractions clearly form an
equicontinuous family, the monoid $WCtr(X)$ is in fact
compact by the
Arzelà–Ascoli
theorem. The weak contractions form an
ideal $J$ of $WCtr(X)$.

Let $f\colon X\to X$ be a weak contraction such that $f^N$ is a contraction for some $N\geq 1$. Then $S=\overline{\langle f\rangle}$ is a compact monothetic subsemigroup of $WCtr(X)$. Moreover, if $I$ is the minimal ideal of $S$, then $I\subseteq f^N S\subseteq J$ and hence consists of contractions. In particular, $f^{\omega}$ is a contraction. But the image of an idempotent is its fixed-point set and it is easily checked that a contraction has at most one fixed-point. Thus $f^{\omega}$ is a constant map to some point $x_0$. By Proposition 3, $x_0$ is the unique fixed-point of $f$ and $f^n(x)\to x_0$ for all $x\in X$. This is the Banach fixed-point theorem for compact metric spaces.

### Stochastic matrices

Let $\mathcal{P}_n$ be the space of all **probability distributions** on
$\{1,\ldots,n\}$. We may identify it with the subspace of (row) vectors in
$\mathbb{R}^n$ with non-negative entries summing to $1$. That is,
$\mathcal{P}_n$ is the standard $(n-1)$-simplex in $\mathbb{R}^n$ and hence is
compact. Let $\mathcal{S}_n$ be the set of all $n\times n$ **stochastic
matrices**, that is, the
set of all square matrices whose rows belong to $\mathcal{P}_n$. Then
$\mathcal{S}_n$
is a compact monoid (homeomorphic to $\mathcal{P}_n^n$) with respect to matrix
multiplication. Moreover, it is a partially ordered monoid with respect to the
usual ordering ($A\geq B$ if $A-B\geq 0$) on non-negative matrices.

The natural right action $\mathbb{R}^n\times \mathcal{S}_n\to \mathbb{R}^n$ is clearly continuous and $\mathcal{P}_n$ is a compact invariant subspace. Moreover, the action of $\mathcal{S}_n$ on $\mathcal{P}_n$ is faithful because $\mathcal{P}_n$ contains the standard unit vectors. Thus we can also view $\mathcal{S}_n$ as a compact monoid of transformations of both $\mathcal{P}_n$ and $\mathbb{R}^n$ with respect to the compact-open topology.

Let $J$ denote the matrix of all ones; note that $P J=J$ for any stochastic matrix $P$.

Lemma 4Let $\varepsilon \gt 0$. Then $I_{\varepsilon} = \{Q\in \mathcal{S}_n \mid Q \geq \varepsilon J \}$ is a closed left ideal of $\mathcal{S}_n$.

Proof:Clearly $I_{\varepsilon}$ is closed. If $P,Q\in \mathcal{S}_n$ with $Q\in I_{\varepsilon}$, then $P Q\geq P (\varepsilon J)=\varepsilon J$ and so $P Q\in I_{\varepsilon}$, as required.

Consequently, the positive stochastic matrices form a left ideal of
$\mathcal{S}_n$ (topologically open). A probability distribution $\pi\in
\mathcal{P}_n$ is **stationary** for a stochastic matrix $P$ if $\pi P=\pi$,
i.e., it is a fixed-point of $P$.

Lemma 5An idempotent positive stochastic matrix $Q$ has a unique stationary distribution $\pi$. Moreover, $\pi$ is strictly positive, each row of $Q$ is $\pi$, and all fixed vectors of $Q$ are multiples of $\pi$. Consequently, $Q$ has rank $1$ and acts as a constant map on $\mathcal{P}_n$ with image $\pi$.

Proof:Let $Q_{i j}$ be the largest entry in column $j$. Then $\sum_{k=1}^n Q_{i k}(Q_{i j}-Q_{k j}) = \sum_{k=1}^n Q_{i k} Q_{i j}-\sum_{k=1}^n Q_{i k}Q_{k j} = Q_{i j}-Q_{i j}=0$ and hence $Q_{i j}=Q_{k j}$, for $k=1,\ldots,n$, as each $Q_{i k} \gt 0$ and $Q_{i j}-Q_{k j}\geq 0$. Thus all rows of $Q$ are $\pi$ for some strictly positive probability distribution $\pi$. Idempotency immediately yields $\pi Q=\pi$. Moreover, since $Q$ is a rank $1$ idempotent, every fixed vector of $Q$ must be a multiple of $\pi$. Consequently, $\pi$ is the unique stationary distribution for $Q$. Since the image of an idempotent mapping is its fixed-point set, we conclude that $Q$ acts on $\mathcal{P}_n$ as a constant map with image $\pi$.

A stochastic matrix $P$ is said to be **ergodic** if $P^m \gt 0$ for some
$m$. Now we can prove one of the main theorems of finite Markov chain
theory. Recall that an eigenvalue of a matrix is **simple** if it has
multiplicity one as a root of the characteristic polynomial.

Theorem 6Let $P$ be an ergodic stochastic matrix. Then:

- $P$ has a unique stationary distribution $\pi$;
- $\pi$ is strictly positive;
- $P^n\to \Pi$ where $\Pi$ is the matrix all of whose rows are $\pi$;
- if $\mu$ is any probability distribution, then $\mu P^n\to \pi$;
- the eigenvalue $1$ of $P$ is simple and all other eigenvalues of $P$ have modulus strictly less than $1$.

Proof:Let $P\in \mathcal{S}_n$ with $P^m \gt 0$. Put $S=\overline{\langle P\rangle}$ and let $I$ be the minimal ideal of $S$. As $I\subseteq S P^m$, it follows from Lemma 4 that $I$ consists of positive elements. Thus $P^{\omega} \gt 0$ and so Lemma 5 implies that $P^{\omega}$ is a constant map on $\mathcal{P}_n$ with image $\pi \gt 0$ its stationary distribution. Proposition 3 then yields that $\pi$ is the unique stationary distribution for $P$, that $P^n\to P^{\omega}$ and that $\mu P^n\to \pi$ for all $\mu\in \mathcal{P}_n$. Moreover, $P^{\omega}=\Pi$ by Lemma 5.For the final statement, observe that $\mathbb{R}^n = \mathbb{R}^n P^{\omega} \oplus \ker P^{\omega} = \mathbb{R}\pi \oplus \ker P^{\omega}$ (the last equality by Lemma 5). Moreover, $P$ preserves this direct sum decomposition because $P$ and $P^{\omega}$ commute. Set $V=\ker P^{\omega}$. From the fact that $P^n\to P^{\omega}$, it follows that the powers of the restriction $P|_V$ of $P$ to $V$ converge to $0$ and hence $P|_V$ has spectral radius strictly less than $1$. Since the characteristic polynomial of $P$ is the product of $x - 1$ and the characteristic polynomial of $P|_V$, it follows that $1$ is a simple eigenvalue of $P$.

A stochastic matrix $P$ is called **irreducible** if, for
all $i,j$, there exists $k \gt 0$ such that $P^k_{i j} \gt 0$. It is easy to
verify that if $P$ is irreducible, then $Q=(1/2) I_n + (1/2) P$ is ergodic,
where $I_n$ is the $n\times n$ identity matrix. Observe that $v Q=v$ if and
only if $v=v P$ for any vector $v$. This leads to the existence and uniqueness
of stationary distributions for irreducible Markov chains.

Corollary 7Let $P$ be an irreducible stochastic matrix. Then $P$ has a unique stationary distribution $\pi$. Moreover, $\pi$ is strictly positive and every fixed vector of $P$ is a multiple of $\pi$.

## Re: A Semigroup Approach to Finite Markov Chains

This is very nice! Thanks!

I’ll make just a trivial terminological comment for the moment. When you started using the phrase “eventual range” in earlier threads, I didn’t realize that it was actually an accepted term. I thought you were just taking what I was calling the eventual image and adapting it to your preference for the word “range” over “image”. But I see now that “eventual range” is a term in use, e.g. it’s defined on page 242 of Lind and Marcus’s book on symbolic dynamics.

(As far as I was concerned, “eventual image” was my own coinage. I googled it a while ago, and discovered that a few other people had used it too, but I didn’t think of trying it with “range” in place of “image”.)