A Semigroup Approach to Finite Markov Chains
Posted by Tom Leinster
Guest post by Benjamin Steinberg
A finite Markov chain is a very simple kind of transition system:
It can be encoded as a square matrix of nonnegative reals, each of whose rows sums to 1 (a ‘stochastic matrix’). Here you’ll meet the fundamental results on finite Markov chains, proved in a way that appears to be new. — TL
I’m going to prove all the standard finite Markov chain convergence results using the theory of compact semigroups. This approach may be known, but I couldn’t find it. I don’t obtain any of the bounds on convergence, although I suppose one can obtain them via this method with a little work.
Compact semigroups
By a compact semigroup we mean a non-empty semigroup with a compact Hausdorff topology and jointly continuous multiplication . We shall assume for simplicity that the topology on is metric since our main examples have this property. The following simple exercise from topology will be used without comment in the sequel.
Lemma 1 Let be a compact metric space and a sequence in . Then converges to if and only if every convergent subsequence of converges to .
For example, it is easy to prove from this lemma that if is a compact semigroup which is algebraically a group, then the inverse is continuous. Indeed, if and is a limit of some subsequence of , then and so . By the lemma, it follows that .
A compact semigroup is called monothetic if it is topologically generated by a single element, that is, for some where, as usual, is the subsemigroup generated by . One easily checks that is commutative. In fact, the structure of is well known and we record it in the following theorem, due independently to Koch and Numakura without the metric assumption.
Theorem 2 (Koch, Numakura) Let be a monothetic compact (metric) semigroup. Then has a unique minimal ideal , namely the set of all limits of convergent subsequences of . Moreover, is a compact abelian group. The identity of is the unique idempotent of . Furthermore, and the map is a surjective continuous homomorphism and so in particular is a monothetic compact abelian group.
Proof: The sets with form a decreasing chain of closed ideals and hence have non-empty intersection by compactness. Put ; clearly is a closed ideal. We claim it consists precisely of all limits of convergent subsequences of .
Indeed, suppose that and . Then by passing to a subsequence we may assume that for all and converges to an element . Then by continuity of multiplication. We conclude that . For the converse, first observe that every element of is a limit of a subsequence of . Thus it suffices to prove that if for some , then is a limit of a convergent subsequence of . But then we have for some . If with an increasing sequence, then . Otherwise, for some . But then for all and so . Thus consists of all limits of convergent subsequences of .
We next show that is algebraically a group. It will then follow that it is a compact abelian group. To do this, it suffice to show (by commutativity) that has a solution for all . Suppose and . Passing to a subsequence, we may assume that for all . Extracting further subsequences, we may assume converges to an element . Then This completes the proof that is a group.
Let us show that is the unique minimal ideal. Let be an ideal of . Then . But is a group and so has no proper ideals. Thus and so .
Suppose that is an idempotent. From , it is immediate that is a limit of a subsequence of and hence belongs to . But is a group and therefore contains a unique idempotent, its identity. Clearly and so by minimality. For , one has and so is a continuous surjective homomorphism . This completes the proof.
Let be a compact monothetic semigroup (which we continue to assume to be metric) and let be a faithful continuous left action of on a compact Hausdorff space . Then is a closed -invariant subspace, known as the eventual range of , and acts on by homeomorphisms. Moreover, the action of on factors through the homomorphism from Theorem 2.
Notice that the fixed-point set of is contained in the eventual range. Indeed, implies for all and so . It follows that if the eventual range of is a singleton, that is, if acts as a constant map, then has a unique fixed-point. Indeed, if , then since the eventual range of is -invariant, clearly . But the above discussion shows that each fixed point of belongs to the eventual range and so is the unique fixed-point of . In this case, (by faithfulness of the action) and hence in light of Theorem 2 and Lemma 1. We summarize this discussion in the following proposition.
Proposition 3 Let be a monothetic compact semigroup acting faithfully on the left on a compact Hausdorff space and suppose that acts as a constant map with image , i.e., is the eventual range of . Then is the unique fixed-point of on , (uniformly) and for all . The dual statement holds for right actions.
As a warmup exercise let us give a quick proof of the Banach fixed point theorem for compact metric spaces. Let be a compact metric space. A mapping is a weak contraction if for all . If the inequality is strict for , then is said to be a (strict) contraction. The weak contractions form a submonoid of the monoid of continuous self-maps of , which is easily verified to be closed in the compact-open topology (and hence is metric via the sup norm). Since the weak contractions clearly form an equicontinuous family, the monoid is in fact compact by the Arzelà–Ascoli theorem. The weak contractions form an ideal of .
Let be a weak contraction such that is a contraction for some . Then is a compact monothetic subsemigroup of . Moreover, if is the minimal ideal of , then and hence consists of contractions. In particular, is a contraction. But the image of an idempotent is its fixed-point set and it is easily checked that a contraction has at most one fixed-point. Thus is a constant map to some point . By Proposition 3, is the unique fixed-point of and for all . This is the Banach fixed-point theorem for compact metric spaces.
Stochastic matrices
Let be the space of all probability distributions on . We may identify it with the subspace of (row) vectors in with non-negative entries summing to . That is, is the standard -simplex in and hence is compact. Let be the set of all stochastic matrices, that is, the set of all square matrices whose rows belong to . Then is a compact monoid (homeomorphic to ) with respect to matrix multiplication. Moreover, it is a partially ordered monoid with respect to the usual ordering ( if ) on non-negative matrices.
The natural right action is clearly continuous and is a compact invariant subspace. Moreover, the action of on is faithful because contains the standard unit vectors. Thus we can also view as a compact monoid of transformations of both and with respect to the compact-open topology.
Let denote the matrix of all ones; note that for any stochastic matrix .
Lemma 4 Let . Then is a closed left ideal of .
Proof: Clearly is closed. If with , then and so , as required.
Consequently, the positive stochastic matrices form a left ideal of (topologically open). A probability distribution is stationary for a stochastic matrix if , i.e., it is a fixed-point of .
Lemma 5 An idempotent positive stochastic matrix has a unique stationary distribution . Moreover, is strictly positive, each row of is , and all fixed vectors of are multiples of . Consequently, has rank and acts as a constant map on with image .
Proof: Let be the largest entry in column . Then and hence , for , as each and . Thus all rows of are for some strictly positive probability distribution . Idempotency immediately yields . Moreover, since is a rank idempotent, every fixed vector of must be a multiple of . Consequently, is the unique stationary distribution for . Since the image of an idempotent mapping is its fixed-point set, we conclude that acts on as a constant map with image .
A stochastic matrix is said to be ergodic if for some . Now we can prove one of the main theorems of finite Markov chain theory. Recall that an eigenvalue of a matrix is simple if it has multiplicity one as a root of the characteristic polynomial.
Theorem 6 Let be an ergodic stochastic matrix. Then:
- has a unique stationary distribution ;
- is strictly positive;
- where is the matrix all of whose rows are ;
- if is any probability distribution, then ;
- the eigenvalue of is simple and all other eigenvalues of have modulus strictly less than .
Proof: Let with . Put and let be the minimal ideal of . As , it follows from Lemma 4 that consists of positive elements. Thus and so Lemma 5 implies that is a constant map on with image its stationary distribution. Proposition 3 then yields that is the unique stationary distribution for , that and that for all . Moreover, by Lemma 5.
For the final statement, observe that (the last equality by Lemma 5). Moreover, preserves this direct sum decomposition because and commute. Set . From the fact that , it follows that the powers of the restriction of to converge to and hence has spectral radius strictly less than . Since the characteristic polynomial of is the product of and the characteristic polynomial of , it follows that is a simple eigenvalue of .
A stochastic matrix is called irreducible if, for all , there exists such that . It is easy to verify that if is irreducible, then is ergodic, where is the identity matrix. Observe that if and only if for any vector . This leads to the existence and uniqueness of stationary distributions for irreducible Markov chains.
Corollary 7 Let be an irreducible stochastic matrix. Then has a unique stationary distribution . Moreover, is strictly positive and every fixed vector of is a multiple of .
Re: A Semigroup Approach to Finite Markov Chains
This is very nice! Thanks!
I’ll make just a trivial terminological comment for the moment. When you started using the phrase “eventual range” in earlier threads, I didn’t realize that it was actually an accepted term. I thought you were just taking what I was calling the eventual image and adapting it to your preference for the word “range” over “image”. But I see now that “eventual range” is a term in use, e.g. it’s defined on page 242 of Lind and Marcus’s book on symbolic dynamics.
(As far as I was concerned, “eventual image” was my own coinage. I googled it a while ago, and discovered that a few other people had used it too, but I didn’t think of trying it with “range” in place of “image”.)