Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 3, 2012

A Semigroup Approach to Finite Markov Chains

Posted by Tom Leinster

Guest post by Benjamin Steinberg

A finite Markov chain is a very simple kind of transition system:

It can be encoded as a square matrix of nonnegative reals, each of whose rows sums to 1 (a ‘stochastic matrix’). Here you’ll meet the fundamental results on finite Markov chains, proved in a way that appears to be new. — TL

I’m going to prove all the standard finite Markov chain convergence results using the theory of compact semigroups. This approach may be known, but I couldn’t find it. I don’t obtain any of the bounds on convergence, although I suppose one can obtain them via this method with a little work.

Compact semigroups

By a compact semigroup we mean a non-empty semigroup S with a compact Hausdorff topology and jointly continuous multiplication S×SS. We shall assume for simplicity that the topology on S is metric since our main examples have this property. The following simple exercise from topology will be used without comment in the sequel.

Lemma 1  Let (X,d) be a compact metric space and {x n} a sequence in X. Then x n converges to xX if and only if every convergent subsequence of {x n} converges to x.

For example, it is easy to prove from this lemma that if S is a compact semigroup which is algebraically a group, then the inverse is continuous. Indeed, if s ns and s˜ is a limit of some subsequence of {s n 1}, then s˜s=1=ss˜ and so s˜=s 1. By the lemma, it follows that s n 1s 1.

A compact semigroup is called monothetic if it is topologically generated by a single element, that is, S=s¯ for some sS where, as usual, s={s nn1} is the subsemigroup generated by s. One easily checks that S is commutative. In fact, the structure of S is well known and we record it in the following theorem, due independently to Koch and Numakura without the metric assumption.

Theorem 2 (Koch, Numakura)  Let S=s¯ be a monothetic compact (metric) semigroup. Then S has a unique minimal ideal I, namely the set of all limits of convergent subsequences of {s n}. Moreover, I is a compact abelian group. The identity s ω of I is the unique idempotent of S. Furthermore, I=s ωS and the map xs ωx is a surjective continuous homomorphism SI and so in particular I is a monothetic compact abelian group.

Proof:  The sets s nS with n1 form a decreasing chain of closed ideals and hence have non-empty intersection by compactness. Put I= n1s nS; clearly I is a closed ideal. We claim it consists precisely of all limits of convergent subsequences of {s n}.

Indeed, suppose that x=lims n k and n1. Then by passing to a subsequence we may assume that n k>n for all k and {s n kn} converges to an element yS. Then x=s ny by continuity of multiplication. We conclude that xI. For the converse, first observe that every element of S{s kk1} is a limit of a subsequence of {s n}. Thus it suffices to prove that if s mI for some m1, then s m is a limit of a convergent subsequence of {s n}. But then we have s m=s my for some yS. If y=lims n k with n k an increasing sequence, then s m=lims m+n k. Otherwise, s m=s m+k for some k1. But then s m+nk=s for all n1 and so s m+nks m. Thus I consists of all limits of convergent subsequences of {s n}.

We next show that I is algebraically a group. It will then follow that it is a compact abelian group. To do this, it suffice to show (by commutativity) that ax=b has a solution xI for all a,bI. Suppose a=lims n k and b=lims m k. Passing to a subsequence, we may assume that m k>n k for all k. Extracting further subsequences, we may assume s m kn k converges to an element xS. Then ax=lims n ks m kn k=lims m k=b. This completes the proof that I is a group.

Let us show that I is the unique minimal ideal. Let J be an ideal of S. Then IJIJJ. But I is a group and so has no proper ideals. Thus IJ=I and so IJ.

Suppose that fS is an idempotent. From f=limf n, it is immediate that f is a limit of a subsequence of {s n} and hence belongs to I. But I is a group and therefore contains a unique idempotent, its identity. Clearly s ωSI and so s ωS=I by minimality. For x,yS, one has s ωxs ωy=(s ω) 2xy=s ωxy and so xs ωx is a continuous surjective homomorphism Ss ωS=I. This completes the proof.

Let S=s¯ be a compact monothetic semigroup (which we continue to assume to be metric) and let S×XX be a faithful continuous left action of S on a compact Hausdorff space X. Then s ωX is a closed S-invariant subspace, known as the eventual range of s, and S acts on s ωX by homeomorphisms. Moreover, the action of S on s ωX factors through the homomorphism Ss ωS from Theorem 2.

Notice that the fixed-point set of s is contained in the eventual range. Indeed, sx=x implies s nx=x for all n1 and so s ωx=x. It follows that if the eventual range of s is a singleton, that is, if s ω acts as a constant map, then s has a unique fixed-point. Indeed, if s ωX={x 0}, then since the eventual range of s is S-invariant, clearly sx 0=x 0. But the above discussion shows that each fixed point of s belongs to the eventual range and so x 0 is the unique fixed-point of S. In this case, s ωS={s ω} (by faithfulness of the action) and hence s ns ω in light of Theorem 2 and Lemma 1. We summarize this discussion in the following proposition.

Proposition 3  Let S=s¯ be a monothetic compact semigroup acting faithfully on the left on a compact Hausdorff space X and suppose that s ω acts as a constant map with image x 0, i.e., {x 0} is the eventual range of s. Then x 0 is the unique fixed-point of s on X, s ns ω (uniformly) and s nxx 0 for all xX. The dual statement holds for right actions.

As a warmup exercise let us give a quick proof of the Banach fixed point theorem for compact metric spaces. Let (X,d) be a compact metric space. A mapping f:XX is a weak contraction if d(f(x),f(y))d(x,y) for all x,yX. If the inequality is strict for xy, then f is said to be a (strict) contraction. The weak contractions form a submonoid WCtr(X) of the monoid of continuous self-maps of X, which is easily verified to be closed in the compact-open topology (and hence is metric via the sup norm). Since the weak contractions clearly form an equicontinuous family, the monoid WCtr(X) is in fact compact by the Arzelà–Ascoli theorem. The weak contractions form an ideal J of WCtr(X).

Let f:XX be a weak contraction such that f N is a contraction for some N1. Then S=f¯ is a compact monothetic subsemigroup of WCtr(X). Moreover, if I is the minimal ideal of S, then If NSJ and hence consists of contractions. In particular, f ω is a contraction. But the image of an idempotent is its fixed-point set and it is easily checked that a contraction has at most one fixed-point. Thus f ω is a constant map to some point x 0. By Proposition 3, x 0 is the unique fixed-point of f and f n(x)x 0 for all xX. This is the Banach fixed-point theorem for compact metric spaces.

Stochastic matrices

Let 𝒫 n be the space of all probability distributions on {1,,n}. We may identify it with the subspace of (row) vectors in n with non-negative entries summing to 1. That is, 𝒫 n is the standard (n1)-simplex in n and hence is compact. Let 𝒮 n be the set of all n×n stochastic matrices, that is, the set of all square matrices whose rows belong to 𝒫 n. Then 𝒮 n is a compact monoid (homeomorphic to 𝒫 n n) with respect to matrix multiplication. Moreover, it is a partially ordered monoid with respect to the usual ordering (AB if AB0) on non-negative matrices.

The natural right action n×𝒮 n n is clearly continuous and 𝒫 n is a compact invariant subspace. Moreover, the action of 𝒮 n on 𝒫 n is faithful because 𝒫 n contains the standard unit vectors. Thus we can also view 𝒮 n as a compact monoid of transformations of both 𝒫 n and n with respect to the compact-open topology.

Let J denote the matrix of all ones; note that PJ=J for any stochastic matrix P.

Lemma 4  Let ε>0. Then I ε={Q𝒮 nQεJ} is a closed left ideal of 𝒮 n.

Proof:  Clearly I ε is closed. If P,Q𝒮 n with QI ε, then PQP(εJ)=εJ and so PQI ε, as required.

Consequently, the positive stochastic matrices form a left ideal of 𝒮 n (topologically open). A probability distribution π𝒫 n is stationary for a stochastic matrix P if πP=π, i.e., it is a fixed-point of P.

Lemma 5  An idempotent positive stochastic matrix Q has a unique stationary distribution π. Moreover, π is strictly positive, each row of Q is π, and all fixed vectors of Q are multiples of π. Consequently, Q has rank 1 and acts as a constant map on 𝒫 n with image π.

Proof:  Let Q ij be the largest entry in column j. Then k=1 nQ ik(Q ijQ kj)= k=1 nQ ikQ ij k=1 nQ ikQ kj=Q ijQ ij=0 and hence Q ij=Q kj, for k=1,,n, as each Q ik>0 and Q ijQ kj0. Thus all rows of Q are π for some strictly positive probability distribution π. Idempotency immediately yields πQ=π. Moreover, since Q is a rank 1 idempotent, every fixed vector of Q must be a multiple of π. Consequently, π is the unique stationary distribution for Q. Since the image of an idempotent mapping is its fixed-point set, we conclude that Q acts on 𝒫 n as a constant map with image π.

A stochastic matrix P is said to be ergodic if P m>0 for some m. Now we can prove one of the main theorems of finite Markov chain theory. Recall that an eigenvalue of a matrix is simple if it has multiplicity one as a root of the characteristic polynomial.

Theorem 6  Let P be an ergodic stochastic matrix. Then:

  1. P has a unique stationary distribution π;
  2. π is strictly positive;
  3. P nΠ where Π is the matrix all of whose rows are π;
  4. if μ is any probability distribution, then μP nπ;
  5. the eigenvalue 1 of P is simple and all other eigenvalues of P have modulus strictly less than 1.

Proof:  Let P𝒮 n with P m>0. Put S=P¯ and let I be the minimal ideal of S. As ISP m, it follows from Lemma 4 that I consists of positive elements. Thus P ω>0 and so Lemma 5 implies that P ω is a constant map on 𝒫 n with image π>0 its stationary distribution. Proposition 3 then yields that π is the unique stationary distribution for P, that P nP ω and that μP nπ for all μ𝒫 n. Moreover, P ω=Π by Lemma 5.

For the final statement, observe that n= nP ωkerP ω=πkerP ω (the last equality by Lemma 5). Moreover, P preserves this direct sum decomposition because P and P ω commute. Set V=kerP ω. From the fact that P nP ω, it follows that the powers of the restriction P V of P to V converge to 0 and hence P V has spectral radius strictly less than 1. Since the characteristic polynomial of P is the product of x1 and the characteristic polynomial of P V, it follows that 1 is a simple eigenvalue of P.

A stochastic matrix P is called irreducible if, for all i,j, there exists k>0 such that P ij k>0. It is easy to verify that if P is irreducible, then Q=(1/2)I n+(1/2)P is ergodic, where I n is the n×n identity matrix. Observe that vQ=v if and only if v=vP for any vector v. This leads to the existence and uniqueness of stationary distributions for irreducible Markov chains.

Corollary 7  Let P be an irreducible stochastic matrix. Then P has a unique stationary distribution π. Moreover, π is strictly positive and every fixed vector of P is a multiple of π.

Posted at January 3, 2012 10:31 PM UTC

TrackBack URL for this Entry:   http://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2484

5 Comments & 0 Trackbacks

Re: A Semigroup Approach to Finite Markov Chains

This is very nice! Thanks!

I’ll make just a trivial terminological comment for the moment. When you started using the phrase “eventual range” in earlier threads, I didn’t realize that it was actually an accepted term. I thought you were just taking what I was calling the eventual image and adapting it to your preference for the word “range” over “image”. But I see now that “eventual range” is a term in use, e.g. it’s defined on page 242 of Lind and Marcus’s book on symbolic dynamics.

(As far as I was concerned, “eventual image” was my own coinage. I googled it a while ago, and discovered that a few other people had used it too, but I didn’t think of trying it with “range” in place of “image”.)

Posted by: Tom Leinster on January 3, 2012 11:07 PM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Can the methods used here be related to the metric coinduction of Kozen and Ruozzi?

Posted by: David Corfield on January 4, 2012 9:32 AM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Good question. I have no idea.
Posted by: Benjamin Steinberg on January 5, 2012 12:52 AM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Looks cool, and though it is a bit beyond me, does it relate to earlier work on markov chains and semigroups?

http://u.math.biu.ac.il/~margolis/Representation%20Theory%20Seminar/Semigroups%20and%20Ring%20Theoretic%20Methods%20in%20Probability.pdf

Posted by: Trurl on January 4, 2012 6:41 PM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Yes and no. If S is any compact semigroup then the probability measures on S form a compact semigroup under convolution. The stochastic matrices essentially form a special case. Some of the convergence results of Diaconis and Brown can be understood from this viewpoint but you need semigroup representation theory for the bounds on rate of convergence.
Posted by: Benjamin Steinberg on January 5, 2012 12:58 AM | Permalink | Reply to this

Post a New Comment