Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

August 9, 2016

In Praise of the Gershgorin Disc Theorem

Posted by Tom Leinster

I’m revising the notes for the introductory linear algebra class that I teach, and wondering whether I can find a way to fit in the wonderful but curiously unpromoted Gershgorin disc theorem.

The Gershgorin disc theorem is an elementary result that allows you to make very fast deductions about the locations of eigenvalues. For instance, it lets you look at the matrix

(3 i 1 1 4+5i 2 2 1 1) \begin{pmatrix} 3 &i &1 \\ -1 &4 + 5i &2 \\ 2 &1 &-1 \end{pmatrix}

and see, with only the most trivial mental arithmetic, that the real parts of its eigenvalues must all lie between 4-4 and 77 and the imaginary parts must lie between 3-3 and 88.

I wasn’t taught this theorem as an undergraduate, and ever since I learned it a few years ago, have wondered why not. I feel ever so slightly resentful about it. The theorem is so useful, and the proof is a pushover. Was it just me? Did you get taught the Gershgorin disc theorem as an undergraduate?

Here’s the statement:

Theorem (Gershgorin) Let A=(a ij)A = (a_{i j}) be a square complex matrix. Then every eigenvalue of AA lies in one of the Gershgorin discs

{z:|za ii|r i} \{ z \in \mathbb{C} \colon |z - a_{i i}| \leq r_i \}

where r i= ji|a ij|r_i = \sum_{j \neq i} |a_{i j}|.

For example, if

A=(3 i 1 1 4+5i 2 2 1 1) A = \begin{pmatrix} 3 &i &1 \\ -1 &4 + 5i &2 \\ 2 &1 &-1 \end{pmatrix}

(as above) then the three Gershgorin discs have:

  • centre 33 and radius |i|+|1|=2|i| + |1| = 2,
  • centre 4+5i4 + 5i and radius |1|+|2|=3|-1| + |2| = 3,
  • centre 1-1 and radius |2|+|1|=3|2| + |1| = 3.

Diagram of three Gershgorin discs

Gershgorin’s theorem says that every eigenvalue lies in the union of these three discs. My statement about real and imaginary parts follows immediately.

Even the proof is pathetically simple. Let λ\lambda be an eigenvalue of AA. Choose a λ\lambda-eigenvector xx, and choose ii so that |x i||x_i| is maximized. Taking the iith coordinate of the equation Ax=λxA x = \lambda x gives

(λa ii)x i= jia ijx j. (\lambda - a_{i i})x_i = \sum_{j \neq i} a_{i j} x_j.

Now take the modulus of each side:

|λa ii||x i|=| jia ijx j| ji|a ij||x j|( ji|a ij|)|x i|=r i|x i| |\lambda - a_{i i}| |x_i| = \left| \sum_{j \neq i} a_{i j} x_j \right| \leq \sum_{j \neq i} |a_{i j}| |x_j| \leq \left( \sum_{j \neq i} |a_{i j}| \right) |x_i| = r_i |x_i|

where to get the inequalities, we used the triangle inequality and then the maximal property of |x i||x_i|. Cancelling |x i||x_i| gives |λa ii|r i|\lambda - a_{i i}| \leq r_i. And that’s it!

The theorem is often stated with a supplementary part that gives further information about the location of the eigenvalues: if the union of kk of the discs forms a connected-component of the union of all of them, then exactly kk eigenvalues lie within it. In the example shown, this tells us that there’s exactly one eigenvalue in the blue disc at the top right and exactly two eigenvalues in the union of the red and green discs. (But the theorem says nothing about where those two eigenvalues are within that union.) That’s harder to prove, so I can understand why it wouldn’t be taught in a first course.

But the main part is entirely elementary in both its statement and its proof, as well as being immediately useful. As far as that main part is concerned, I’m curious to know: when did you first meet Gershgorin’s disc theorem?

Posted at August 9, 2016 5:21 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2894

26 Comments & 0 Trackbacks

Re: In Praise of the Gershgorin Disc Theorem

Don’t think anyone’s taught me this theorem. Nice proof, though! Do you know any nice applications of it?

Posted by: Qiaochu Yuan on August 9, 2016 7:21 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

To be honest, I don’t. Maybe that’s not surprising: I guess Gershgorin’s theorem is most often used (and taught) in the context of numerical linear algebra, and I know very little about that subject. But I have a biologist friend who’s got a real thing about Gershgorin discs — I should ask him why!

But speaking generally, here are some uses of the theorem:

  • Any square matrix AA whose diagonal entries are big enough relative to the rest of the row (precisely: |a ii|> ji|a ij||a_{i i}| \gt \sum_{j \neq i} |a_{i j}| for all ii) is invertible.

  • Any real symmetric matrix such that a ii> ji|a ij|a_{i i} \gt \sum_{j \neq i} |a_{i j}| for all ii is positive definite.

  • For any square matrix, we get an upper bound on the absolute values of the eigenvalues: they can be no greater than the largest of the absolute row-sums j|a ij|\sum_j |a_{i j}|. Since the dominant eigenvalue is very important in many dynamics problems, often governing rates of growth, that’s potentially quite useful.

I think the only time I’ve seriously used Gershgorin’s theorem in my own research was in Proposition 10.8 of this. I can’t claim that this is a particularly compelling application, but here’s what we (or rather my coauthor Mark Meckes) did:

  • We started with a real symmetric matrix ZZ with 11s all the way down the diagonal and ji|Z ij|<1\sum_{j \neq i} |Z_{i j}| \lt 1 for all ii.

  • By Gershgorin, every eigenvalue is in the interval (0,2)(0, 2).

  • It follows that every eigenvalue of IZI - Z has absolute value strictly less than 11.

  • And from this it follows that the infinite sum k0(IZ) k\sum_{k \geq 0} (I - Z)^k converges — to Z 1Z^{-1}, of course.

So in this case, it helped us get a series expansion for the inverse of a matrix.

Posted by: Tom Leinster on August 9, 2016 7:46 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Sometimes called “Gerschgorin’s Circle Theorem,” a proof of it is given near the beginning of Eugene Isaacson and Herbert Bishop Keller, Analysis of Numerical Methods (Wiley, 1966), Chapter 4, “Computation of Eigenvalues and Eigenvectors.” In doing reliability engineering in the early 1970s, birth-and-death processes were sometimes used to model systems which had associated tri-diagonal matrices. The theorem facilitated calculation of the eigenvalues.

Posted by: David Austin on August 10, 2016 1:10 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Sometimes called “Gershgorin’s Circle Theorem”

Right — lots of people do call it that. But mathematicians settled long ago on using “circle” for the hollow shape and “disc” for the solid shape, and the actual circles play no particular role in Gershgorin’s theorem; it’s the discs that matter. So I prefer the name “Gershgorin disc theorem”. The canonical text Matrix Analysis of Horn and Johnson also calls it that.

Posted by: Tom Leinster on August 10, 2016 1:59 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

I was not taught this theorem as an undergraduate, but came across it in an economics paper soon later. The first “use” of the theorem that you wrote about in a comment, is actually equivalent to this theorem, and is the form in which I encountered it (namely, that a diagonally dominant matrix is invertible). I wrote a blog post about this: here.

Apparently this result has been independently rediscovered several dozen times; a paper by Olga Taussky[-Todd] titled A Recurring Theorem on Determinants in 1949 gives 25 references and has in its first sentence the remark “the theorem is not as well known as it deserves to be”!

Posted by: ShreevatsaR on August 10, 2016 4:10 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Aha, I hadn’t realized that equivalence — that’s nice. Let me repeat what you wrote for anyone too lazy to click.

We’re talking about two theorems here:

  • the Gershgorin disc theorem

  • what’s sometimes called the Levy–Desplanques theorem: any square matrix AA satisfying |a ii|> ji|a ij||a_{i i}| \gt \sum_{j \neq i} |a_{i j}| for all ii is invertible. (Such matrices are called “strictly diagonally dominant”.)

I observed that Levy–Desplanques is an easy consequence of Gershgorin (since if AA satisfies the hypotheses of the L-D theorem then none of the discs contains 00).

But you’re observing the converse: Gershgorin is also an easy consequence of Levy–Desplanques! Indeed, assume L-D and let λ\lambda be an eigenvalue of a matrix AA. Then AλA - \lambda is not invertible, so by the contrapositive of L-D, there is some ii such that |a iiλ| ji|a ij||a_{i i} - \lambda| \leq \sum_{j \neq i} |a_{i j}|. Done!

Posted by: Tom Leinster on August 10, 2016 1:49 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

I seem to have hit on something important. There’s a book called Geršgorin and his Circles by the distinguished numerical linear algebraist Richard Varga, and in the introduction he writes:

There are two main recurring themes which the reader will see in this book. The first recurring theme is that a nonsingularity theorem for matrices gives rise to an equivalent eigenvalue inclusion set in the complex plane for matrices, and conversely. Though common knowledge today, this was not widely recognized until many years after Geršgorin’s paper appeared [in 1931].

He gives an example later on: the Levy–Desplanques theorem (which is a nonsingularity theorem) and the Gershgorin theorem (which gives an “eigenvalue inclusion set” — the union of the Gershgorin discs). But I don’t know any other examples. I’m viewing the book via Amazon’s preview, and it doesn’t let me see any more. I wonder what other examples there are.

Posted by: Tom Leinster on August 11, 2016 2:13 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Interesting! Here is another example treated in that book (the first two pages of Chapter 2).

It concerns a matrix A=[a i,j] n×n,n2A = [a_{i, j}] \in \mathbb{C}^{n \times n}, n \ge 2. Let r i(A)= ji|a i,j|r_i(A) = \sum_{j \neq i} |a_{i,j}|.

Theorem (Ostrowkski 1937). If |a i,i||a j,j|>r i(A)r j(A)|a_{i,i}| \cdot |a_{j,j}| \gt r_i(A)\cdot r_j(A) for all iji \neq j, then AA is nonsingular.

This is a nonsingularity theorem, and according to the “first recurring theme”, it is equivalent to an eigenvalue inclusion set:

Theorem (Brauer 1947). For each pair (i,j)(i,j), let the (i,j)(i,j)-th “Brauer-Cassini oval” be the region K i,j(A)={z:|za i,i||za j,j|r i(A)r j(A)}.K_{i, j}(A) = \{z \in \mathbb{C} : |z-a_{i,i}| \cdot |z-a_{j,j}| \le r_i(A) \cdot r_j(A)\}. For any eigenvalue λ\lambda of AA, there exists a pair of distinct (i,j)(i, j) such that λK i,j(A)\lambda \in K_{i,j}(A).

There is an online demonstration on this page (enter the matrix and click on the “Plot” button), showing both Gershgorin disks and Brauer’s ovals of Cassini for 3×33 \times 3 complex matrices.

Posted by: ShreevatsaR on August 13, 2016 2:36 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Nice — thanks a lot! It’s easy to see how Ostrowski \iff Brauer, but I should have a try at proving one or other of them. I see from the page you linked to that Brauer’s theorem is always at least as good as Gershgorin’s, in the sense that it confines the eigenvalues to a smaller subset of the complex plane.

I had a play with that excellent online tool, and there’s something I don’t understand. If I put in the matrix of my post, I get this picture:

Gershgorin discs and Brauer ovals

The Gershgorin discs look as I’d expect. However, what you wrote suggests that for an n×nn \times n matrix, there should be n(n1)/2n(n-1)/2 Brauer–Cassini ovals, and the text on that web page confirms it. In this case, n=3n = 3, so there should be 3(31)/2=33(3-1)/2 = 3 ovals. On the other hand, the picture shows five non-circular brown regions. What’s going on?

Posted by: Tom Leinster on August 13, 2016 3:12 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

You can click on the label “Gerschgorin discs” to turn off the circles, and it results in this image:

All brown

This was still hard for me to understand what was going on, so I got the code from the github repository and hacked it a bit to display the three ovals in different colors (as is being done to the disks right now), and it resulted in this:

Blue green yellow

What’s going on I think is that the green region is a single connected “dog bone” (in the terms of MathWorld on Cassini Ovals), while the blue and yellow are each two loops, each overlapping with the green one and with each other.

Posted by: ShreevatsaR on August 13, 2016 6:00 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

I’ve put up the minor modification at https://shreevatsa.github.io/cassini/ so that one can see the ovals more clearly. After entering the matrix and clicking on the Plot button, clicking on “Brauer’s ovals of Cassini” will make them first appear separately in random colors before appearing together.

Posted by: ShreevatsaR on August 13, 2016 9:58 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Huge thanks for this! I spent a weekend mostly unplugged from the internet, and plugged back in to find that people had written all sorts of wonderful things in reply to my posts.

What you’ve written explains everything. The crucial point: I hadn’t realized that Cassini ovals could be disconnected.

As far as I can remember, this is the first precise mathematical usage of the word “oval” that I’ve come across. The diagrams on the original web tool (and the Wikipedia page) had already made me understand that Cassini ovals can be dog-bone-shaped, which was a bit of a surprise to me given what I intuitively picture an “oval” to be. But I hadn’t read up enough to realize that they could have two connected-components.

It’s worth quoting the beginning of that Wikipedia page:

A Cassini oval is a quartic plane curve defined as the set (or locus) of points in the plane such that the product of the distances to two fixed points is constant. This may be contrasted to an ellipse, for which the sum of the distances is constant, rather than the product.

There’s a nice illustration here. Cassini ovals can look like what I’d call an oval, or a dog-bone, or a figure of eight, or a union of two reflectionally-symmetric ovals.

(Here I’ve succumbed to modernity and said “Cassini ovals” rather than the more picturesque and poetic “ovals of Cassini”. But I slightly regret it. The term “ovals of Cassini” gives me happy long-ago memories of learning Arabic grammar, where one meets Inna and the sisters of Inna — in my memory, always said that way, never anything so vulgar as “Inna and her sisters”.)

Posted by: Tom Leinster on August 16, 2016 12:56 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

I think there are twice the number of ovals present, because we can choose to sum over either rows or columns. If you turn off the Ovals, you can observe 6 discs of Gerschgorin, too, pairs of which are concentric!

Posted by: Phalguni on July 31, 2017 5:41 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Yes— I learnt it in Alan Pryde’s very beautiful course on Matrix Analysis– in second or third year, I think.

Posted by: Julie on August 10, 2016 5:22 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Lucky you! I never took a course on matrix analysis, and that kind of thing didn’t appeal to me at all when I was an undergraduate.

The course I teach is for students entering directly into the second year of our degree programme. It’s called “Accelerated Algebra”, and it’s very accelerated because they have to catch up with all the first year linear algebra in less than half the usual time. So it’s really very difficult for me to fit in anything non-essential. Nevertheless, I now plan to set a couple of exercises on Gershgorin’s theorem: one to guide the students through a proof, and another to get them to apply it to some specific matrices.

Posted by: Tom Leinster on August 10, 2016 1:55 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

If I remember correctly, I was taught this theorem in the second year during the basic numerical analysis course. Maybe some application in numerical analysis were given, but to be sure I would have to dig up my old notebook.

(Bachelors mathematics program at the Charles University in Prague.)

Posted by: Vít Tuček on August 10, 2016 12:24 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Thanks!

Evelyn Lamb on Twitter also asked who’d been taught this, and got lots of responses. Well: perhaps it’s predictable that those who’d seen it would be more likely to respond than those who hadn’t. But I liked her ultimate verdict.

Posted by: Tom Leinster on August 12, 2016 1:49 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

I used the theorem many (many) years ago to prove that nuclear reactor oscillations could not be caused by just neutron coupling of the various core regions.

Posted by: Wes Harker on August 11, 2016 11:30 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Wowee! That’s a nice companion to Andrej Bauer’s recent story about nuclear safety relying on the convergence of the harmonic series. (I exaggerate… at least a little!)

Posted by: Tom Leinster on August 12, 2016 1:42 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

I definitely never learned the Gershgorin theorem as an undergraduate, or in any class. I don’t recall exactly when I learned it, or where. I did a lot of reading about matrix analysis when I was a grad student and postdoc.

Although it’s a beautiful theorem with some nice applications, as you point out there’s a lot of stuff that needs to go into an undergraduate linear algebra course. Personally I never mention it in undergraduate linear algebra, but I do teach it in a graduate Matrix Analysis course. I think setting some problems about it is a nice idea, partly because it makes a valuable point — that you can extract some nontrivial information about invariant quantities (eigenvalues) associated to a matrix quite directly from the highly non-invariant matrix entries.

Incidentally, since we wrote the paper you linked to above, I noticed an argument for our theorem that bypasses Gershgorin. Write \| \cdot \|_\infty for the induced \ell^\infty norm on a matrix (i.e., the operator norm when you put the \ell^\infty norm on n\mathbb{R}^n), which turns out to be the “maximum row sum”. Our hypothesis implies that ZI \|Z-I\|_\infty is smaller than 1; this immediately implies the convergence of the series expansion for Z 1Z^{-1}. (There’s also an argument based on Perron–Frobenius theory, but that requires more overhead than Gershgorin, not less.)

(Exercise for the reader: Adapt the argument above to prove the Levy–Desplanques theorem, and hence the Gershgorin theorem.)

Posted by: Mark Meckes on August 19, 2016 2:33 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Ah yes, that’s a good alternative.

It moves me to point out a different connection between the operator norm \|\cdot\|_\infty and the Gershgorin disc theorem. Well, I say “point out”, but not because I think you don’t know it, Mark — more because it’s the kind of thing I’m likely to forget unless I write it down.

Gershgorin swiftly implies that for any eigenvalue λ\lambda of a matrix A=(a ij)A = (a_{i j}),

|λ|max i j|a ij|. |\lambda| \leq \max_i \sum_j |a_{i j}|.

But as you point out, the right-hand side here is an explicit expression for A \|A\|_\infty. So,

|λ|A . |\lambda| \leq \|A\|_\infty.

On the other hand, that’s very easy to see without going anywhere near Gershgorin’s theorem: simply choose a λ\lambda-eigenvector xx, note that

|λ|x =λx =Ax A x , |\lambda| \|x\|_\infty = \|\lambda x\|_\infty = \|A x\|_\infty \leq \|A\|_\infty \|x\|_\infty,

and cancel x \|x\|_\infty. (Of course, the same argument shows that all the other norms A p\|A\|_p are also upper bounds for the absolute eigenvalues.)

Posted by: Tom Leinster on August 21, 2016 4:15 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

It’s a cute fact that the same is true not only for operator norms, but for arbitrary submultiplicative norms. If xx is a λ\lambda-eigenvector for AA, let XX be the square matrix whose columns are all equal to xx. Then |λ|X=λX=AXAX. |\lambda| \| X \| = \| \lambda X \| = \| A X \| \le \| A \| \| X \|.

But this isn’t actually a substantial generalization: you can use the same xXx \mapsto X trick to show that given any submultiplicative norm on n×nn \times n matrices, there is an operator norm which is always less than or equal to it.

Posted by: Mark Meckes on August 22, 2016 3:43 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

Why the emphasis on dominance in each row? Surely also looking at the columns as well (valid because transposes have the same eigenvalues) would sometimes help. Indeed, for the given example don’t we reduce the radius of the disc with center 4 + 5i to 2?

Posted by: Aaron Denney on September 16, 2016 12:06 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

You’re absolutely right: you could use columns instead. It’s a trade-off, though! When you switch from rows to columns, some discs may get smaller… but then others must get bigger. It’s never the case that they all get smaller.

Take the example matrix I gave in the post:

(3 i 1 1 4+5i 2 2 1 1). \begin{pmatrix} 3 &i& 1\\ -1 &4+5i &2\\ 2&1&-1 \end{pmatrix}.

Using rows gives you discs with:

  • centre 33, radius 22;
  • centre 4+5i4+5i, radius 33;
  • centre 1-1, radius 33.

Write GG for the union of those three discs.

On the other hand, using columns gives you discs with:

  • centre 33, radius 33;
  • centre 4+5i4+5i, radius 22;
  • centre 1-1, radius 33.

Write GG' for the union of these three discs.

When you switch from rows to columns, the first disc grows, the second shrinks, and the third stays the same. It’s easy to see why they can’t all shrink: for whether you use rows or columns, the sum of the three radii is the same, ij|a ij|\sum_{i \neq j} |a_{i j}|. In this case, both sums are 88.

Nevertheless, since the eigenvalues all lie in GG and all lie in GG', they all lie in GGG \cap G'. This is a smaller set than GG (or GG'), so you do gain something by considering columns as well as rows.

If I was more energetic I’d insert a picture here.

Posted by: Tom Leinster on September 16, 2016 12:41 AM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

There are related theorems that look at the rows and columns simultaneously. For example, a theorem of Ostrowski (different from the one already mentioned above) states that for each α[0,1]\alpha \in [0,1], the eigenvalues of AA lie in one of the discs {z:|za iir i αc i 1α}, \{ z : |z-a_{ii} \le r_i^\alpha c_i^{1-\alpha} \}, where r i= ji|a ij|andc i= ji|a ji|. r_i = \sum_{j \neq i} |a_{ij}| \; and \; c_i = \sum_{j \neq i} |a_{ji}|.

Posted by: Mark Meckes on September 16, 2016 5:46 PM | Permalink | Reply to this

Re: In Praise of the Gershgorin Disc Theorem

That’s a satisfying result. Gershgorin gives you the result with r ir_i, and Gershgorin on the transpose gives you the result with c ic_i. You might wonder if there’s a continuous family of results interpolating between them. And first of all you might guess that the bounds are the weighted arithmetic means αr i+(1α)c i \alpha r_i + (1 - \alpha)c_i (α[0,1]\alpha \in [0, 1]). But Ostrowski’s result is better than that, because it uses the weighted geometric means r i αc i 1α, r_i^\alpha c_i^{1 - \alpha}, which are smaller (i.e. tighter bounds) than the arithmetic means. In fact, they’re smaller than all the other power means (αr i t+(1α)c i t) 1/t (\alpha r_i^t + (1- \alpha) c_i^t)^{1/t} for all t>0t \gt 0. So it’s a really good bound.

Posted by: Tom Leinster on September 16, 2016 10:04 PM | Permalink | Reply to this

Post a New Comment