Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

August 18, 2013

The Shannon Capacity of a Graph, 2

Posted by Tom Leinster

Graph theorists have many measures of how ‘big’ a graph is: order, size, diameter, radius, circumference, girth, chromatic number, and so on. Last time I told you about two of them: independence number and Shannon capacity. This time I’ll bring in two other measures of size that I’ve written about previously in contexts other than graph theory: maximum diversity and magnitude.

The post will end not with a bang, but a whimper. Although there’s something simple and clean to say about the connection between maximum diversity and standard graph invariants, the relationship with magnitude is much more mysterious, and I don’t have much more to say than ‘what’s going on?’. So there will be no satisfying conclusion. Consider yourself warned!

A crash course in diversity measurement  Imagine an ecosystem consisting of a number of species. We know what proportions the various species are present in, and we know how similar the species are to one another. How can we quantify, in a single number, the ‘diversity’ of the ecosystem?

Over the years, many people have answered this question in many different ways. The answer I’m about to explain is the one given in a paper of mine with Christina Cobbold, called Measuring diversity: the importance of species similarity. I also posted about it here and on John’s blog. But I won’t assume you’ve read any of that.

Let’s say we have nn species, in respective proportions p 1,,p np_1, \ldots, p_n. Thus, p=(p 1,,p n)p = (p_1, \ldots, p_n) is a probability vector: a finite sequence of nonnegative reals summing to 11. And let’s say that the similarity of one species to another is measured on a scale of 00 to 11, with 00 meaning ‘completely different’ and 11 meaning ‘identical’. These similarities are the entries of an n×nn \times n matrix ZZ, the similarity matrix, which I’ll assume is symmetric and has 11s down the diagonal (because any species is identical to itself).

The expected similarity between an individual from the iith species and a randomly-chosen individual is jZ ijp j\sum_j Z_{i j} p_j. This measures how ‘ordinary’ the iith species is. We can express this formula more neatly if we regard pp as a column vector: then the ordinariness of the iith species is simply (Zp) i(Z p)_i, the iith entry of the column vector ZpZ p.

A community in which most species are very ordinary is not very diverse; conversely, a community in which most species are highly distinctive is very diverse. So the average ordinariness, across all species, measures the ecosystem’s lack of diversity. The average ordinariness is

ip i(Zp) i. \sum_i p_i (Z p)_i.

To get a measure of diversity itself, we take the reciprocal:

D 2 Z(p):=1/ ip i(Zp) i. D_2^Z(p) := 1\Big/ \sum_i p_i (Z p)_i.

What’s that 22 doing on the left-hand side? Well, there’s a certain 22-ness to the right-hand side, since the denominator is the quadratic form p TZpp^{\text{T}} Z p. And in fact, this diversity measure is just one of a one-parameter family of measures.

The other members of the family come about by varying our interpretation of ‘average’ in the phrase ‘average ordinariness’. Just now, we interpreted it as ‘arithmetic mean’. But the arithmetic mean is only one member of a larger family: the power means. Let t[,]t \in [-\infty, \infty]. The power mean of order tt of positive reals x 1,,x nx_1, \ldots, x_n, weighted by a probability vector p=(p 1,,p n)p = (p_1, \ldots, p_n), is

M t(p,x):=( i:p i>0p ix i t) 1/t. M_t(p, x) := \Bigl( \sum_{i: p_i\gt 0} p_i x_i^t \Bigr)^{1/t}.

In other words: transform each x ix_i by taking the ttth power, form the arithmetic mean, then apply the inverse transformation. Clearly the formula doesn’t make sense for t=0t = 0 and t=±t = \pm \infty, but we can fill in those gaps by taking the limit as t0t \to 0 or t±t \to \pm \infty. This gives

M 0(p,x)= ix i p i,M (p,x)=max i:p i>0x i,M (p,x)=min i:p i>0x i. M_0(p, x) = \prod_i x_i^{p_i}, \qquad M_\infty(p, x) = \max_{i: p_i \gt 0} x_i, \qquad M_{-\infty}(p, x) = \min_{i: p_i \gt 0} x_i.

In this notation, the formula for diversity given above is

D 2 Z(p)=1/M 1(p,Zp). D_2^Z(p) = 1/M_1(p, Z p).

Generally, for any q[0,]q \in [0, \infty], the diversity of order qq of the ecosystem is

D q Z(p)=1/M q1(p,Zp). D_q^Z(p) = 1/M_{q - 1}(p, Z p).

This turns out to be a quantity between 11 and nn, and can be thought of as the ‘effective number of species’. It takes its minimum value of 11 when only one species is actually present, that is, p i=1p_i = 1 for some ii. It takes its maximum value of nn when the nn species are all completely dissimilar (that is, ZZ is the identity matrix) and they’re present in equal proportions.

Different values of qq attach different amounts of significance to rare species, in a way that I won’t explain. But it’s important to recognize that the diversities of different orders really do measure different things. For example, it often happens that ecosystem A has greater diversity of order 11 than ecosystem B, but less diversity of order 22.

If you know some information theory, you can recognize among these diversity measures some classical measures of entropy. For example, when Z=IZ = I (different species are deemed to have nothing whatsoever in common), we get the exponential of Rényi entropy:

D q I(p)=( ip i q) 1/(1q). D_q^I(p) = \Bigl( \sum_i p_i^q \Bigr)^{1/(1 - q)}.

The even more special case Z=IZ = I, q=1q = 1 gives the exponential of Shannon entropy.

Maximum diversity  Now suppose we have a list of nn species of known similarities, and also suppose that we have control over their proportions or frequencies. In other words, we’ve fixed nn and ZZ and we can vary pp. How can we choose pp to maximize the diversity? And what is the value of that maximum diversity?

Well, these questions don’t entirely make sense. Remember that we have not just one diversity measure, but a one-parameter family of them. Perhaps the pp that maximizes diversity of order 11 comes nowhere near maximizing diversity of order 22. And perhaps sup pD 1 Z(p)\sup_p D_1^Z(p), the maximum diversity of order 11, is different from sup pD 2 Z(p)\sup_p D_2^Z(p). After all, different values of q[0,]q \in [0, \infty] give genuinely different measures.

So it comes as a surprise that, in fact, there’s a ‘best of all possible worlds’: a probability vector that maximizes diversity in all these uncountably many senses at once. And it’s a double surprise: all these different diversity measures have the same maximum value. In other words:

Theorem   Let n1n \geq 1 and let ZZ be an n×nn \times n similarity matrix. Then:

  • There is a probability vector p maxp_{max} such that for all q[0,]q \in [0, \infty], we have D q Z(p max)=sup pD q Z(p)D_q^Z(p_{max}) = \sup_p D_q^Z(p);

  • sup pD q Z(p)\sup_p D_q^Z(p) is independent of q[0,]q \in [0, \infty].

In particular, associated to any similarity matrix ZZ is a real number, its maximum diversity, defined as

D max(Z)=sup pD q Z(p) D_{max}(Z) = \sup_p D_q^Z(p)

for any q[0,]q \in [0, \infty]. Which qq makes no difference: that’s the second part of the theorem.

How graphs come in  Now consider the special case where all inter-species similarities are either 00 or 11: each pair of species is deemed to be either totally dissimilar or totally identical. The matrix ZZ of similarities is then simply a symmetric matrix of 00s and 11s, with 11s down the diagonal.

A matrix with these properties is the same thing as a finite, reflexive, undirected graph — which in these posts I’m just referring to as graphs. That’s because a graph GG can be seen in terms of its adjacency matrix A GA_G.

So, given a graph GG, we can ask: what is the maximum diversity of the matrix A GA_G? The official notation for this is D max(A G)D_{max}(A_G), but I’ll write it as D max(G)D_{max}(G) instead, and call it the maximum diversity of GG. What is it, explicitly?

It turns out that it’s exactly the independence number of GG. In Part 1 I mentioned that a set of vertices in a graph is said to be independent if no two elements have an edge between them, and that the independence number α(G)\alpha(G) is the largest cardinality of an independent set in GG. So, the result is that

D max(G)=α(G) D_{max}(G) = \alpha(G)

for all graphs GG. Or in words:

Maximum diversity is independence number.

There are uncountably many proofs. Because D max(G)D_{max}(G) is equal to sup pD q A G(p)\sup_p D_q^{A_G}(p) for any q[0,]q \in [0, \infty], you can choose whichever qq is most convenient and maximize the diversity of that order. If you take q=q = \infty, it becomes pretty simple — I reckon that’s the easiest proof. (In Example 4.2 of my paper, I did it with q=2q = 2, but that was probably less efficient.)

Back to Shannon capacity  Last time, I gave you the definition of the Shannon capacity of a graph GG: it’s

Θ(G)=sup k1α(G k) 1/k=lim kα(G k) 1/k, \Theta(G) = \sup_{k \geq 1} \alpha\bigl(G^{\boxtimes k}\bigr)^{1/k} = \lim_{k \to \infty} \alpha\bigl(G^{\boxtimes k}\bigr)^{1/k},

where \boxtimes is the ‘strong product’ of graphs: the product in the category of (reflexive, as always) graphs. If we want, we can write the definition of Θ\Theta with D maxD_{max} in place of α\alpha, since they’re the same thing.

I mentioned that computing Shannon capacity is really hard: no one even knows the Shannon capacity of the 7-cycle! So people have put a lot of effort into finding quantities that are similar to Θ\Theta, but easier to compute.

For example, suppose we can find a tight but easy-to-compute upper bound on Θ(G)\Theta(G). We already have plenty of lower bounds, namely, α(G k) 1/k\alpha\bigl(G^{\boxtimes k}\bigr)^{1/k} for any kk we like. So we’d then have sandwiched Θ(G)\Theta(G) between two easy-to-compute bounds. For some graphs, the upper and lower bounds might even be equal, giving us the exact value of Θ(G)\Theta(G).

The best-known substitute for the Shannon capacity is called the the Lovász number, ϑ(G)\vartheta(G). I confess, I don’t have any feel for the definition (which we won’t need here), but here are some of its good properties:

  • It’s much more efficient to compute than the Shannon capacity.

  • It’s an upper bound: Θ(G)ϑ(G)\Theta(G) \leq \vartheta(G). (Yup, the quantity denoted by a small letter is bigger than the quantity denoted by a big letter.)

  • It’s multiplicative: ϑ(GH)=ϑ(G)ϑ(H)\vartheta(G \boxtimes H) = \vartheta(G)\cdot\vartheta(H). This formula helps us to compute the Lovász number of any graph of the form GHG \boxtimes H. Neither the independence number α\alpha nor the Shannon capacity Θ\Theta is multiplicative.

As Omar Antolín Camarena pointed out last time, even the Shannon capacity of the 5-cycle C 5C_5 was unknown until Lovász defined his number. Here’s how Lovász did it. Having proved that ϑ\vartheta is an upper bound for Θ\Theta, he then showed that ϑ(C 5)=5\vartheta(C_5) = \sqrt{5}. But

α(C 5 2) 1/2Θ(C 5)ϑ(C 5)=5, \alpha\bigl( C_5^{\boxtimes 2} \bigr)^{1/2} \leq \Theta(C_5) \leq \vartheta(C_5) = \sqrt{5},

and we calculated last time that α(C 5 2) 1/2=5\alpha\bigl( C_5^{\boxtimes 2} \bigr)^{1/2} = \sqrt{5}. Hence Θ(C 5)\Theta(C_5) can only be 5\sqrt{5}.

(There’s at least one other useful upper bound for ϑ(G)\vartheta(G), defined by Haemers and pointed out by Tobias last time, but I won’t say anything about it.)

Maximum diversity and magnitude  We do know one nice quantity to which maximum diversity is closely related, and that’s magnitude.

I’ve written a lot about magnitude of metric spaces and other structures, but at its most concrete, magnitude is simply an invariant of matrices. A weighting on a matrix AA is a column vector ww such that each entry of the column vector AwA w is 11, and it’s an easy lemma that if both AA and its transpose have at least one weighting then iw i\sum_i w_i is independent of the weighting ww chosen. This quantity iw i\sum_i w_i is the magnitude |A|\left|A\right| of AA. If AA happens to be invertible then it has a unique weighting, and |A|\left|A\right| is simply the sum of all the entries of A 1A^{-1}.

As a trivial example, if AA is the n×nn \times n identity matrix then |A|=n\left|A\right| = n.

There are a couple of possible meanings for ‘the magnitude of a graph’, because there are (at least!) a couple of ways of turning a graph into a matrix. Not so long ago, I wrote about one of them, in which the matrix encodes the shortest-path metric on the graph. But today, I’ll do something different: when I say ‘the magnitude of GG’, I’ll mean the magnitude of the adjacency matrix of GG. That is, by definition,

|G|:=|A G| \left|G\right| := \left|A_G\right| \in \mathbb{Q}

for today. (I don’t know whether |G|\left|G\right| is always defined, that is, whether A GA_G always admits a weighting.)

It’s easy to show that magnitude has one of the good properties satisfied by the Lovász number but not the Shannon capacity: it’s multiplicative. That is,

|GH|=|G||H| \left| G \boxtimes H\right| = \left| G\right| \cdot \left| H \right|

for all graphs GG and HH. So if we were to replace maximum diversity by magnitude in the definition

Θ(G)=sup k1D max(G k) 1/k \Theta(G) = \sup_{k \geq 1} D_{max}\bigl(G^{\boxtimes k})^{1/k}

of Shannon capacity, we’d just end up with magnitude again.

How is maximum diversity related to magnitude? The first fact, which I won’t prove here, is as follows: given a similarity matrix ZZ,

the maximum diversity of ZZ is the magnitude of some principal submatrix of ZZ.

(Recall that a ‘principal submatrix’ of an n×nn \times n matrix is one obtained by choosing a subset SS of {1,,n}\{1, \ldots, n\}, then selecting just those rows and columns indexed by SS.) In particular, given a graph GG,

the maximum diversity of GG is the magnitude of some induced subgraph of GG.

(Recall that an ‘induced subgraph’ of a graph is one consisting of a subset of the vertices and all edges between them.) We can take that subgraph to be any independent set of maximal size: for this subgraph has α(G)\alpha(G) vertices and no nontrivial edges, so by the ‘trivial example’ above, its magnitude is α(G)=D max(G)\alpha(G) = D_{max}(G).

Digression  If we had to compute the maximum diversity of a similarity matrix, how could we do it? For the present post I don’t think it matters, but the answer’s short enough that I might as well tell you. Given an n×nn\times n similarity matrix ZZ, let’s say that S{1,,n}S \subseteq \{1, \ldots, n\} is ‘good’ if the resulting S×SS \times S submatrix Z SZ_S admits at least one weighting all of whose entries are nonnegative. Then

D max(Z)=max{|Z S|:S is good}. D_{max}(Z) = \max \{ \left|Z_S\right| : S \text{ is good} \}.

This gives an algorithm for computing maximum diversity. It’s slow for large nn, since we have to run through all 2 n2^n subsets of {1,,n}\{1, \ldots, n\}, but at least each step is fast.

In some significant cases, maximum diversity is equal to magnitude. For example, suppose we decide to give two species a similarity of 11 if they’re the same, 1/21/2 if they’re different but belong to the same genus, 1/41/4 if they’re in different genera but the same family, and so on up the taxonomic hierarchy. Then D max(Z)=|Z|D_{max}(Z) = \left|Z\right|. The numbers 1,1/2,1/4,1, 1/2, 1/4, \ldots are unimportant here; any other descending sequence would do. In general, given any finite ultrametric space — one satisfying the strengthened triangle inequality

d(x,z)max{d(x,y),d(y,z)} d(x, z) \leq \max \{ d(x, y), d(y, z) \}

— the matrix ZZ with entries Z xy=e d(x,y)Z_{x y} = e^{-d(x, y)} has maximum diversity equal to its magnitude.

Let’s have an example: once more, the 5-cycle C 5C_5.

  • A weighting on the adjacency matrix A GA_G of a graph GG is a way of assigning to each vertex xx a real number w(x)w(x), such that y:y adjacent to xw(y)=1\sum_{y: y \text{ adjacent to } x} w(y) = 1 for each vertex xx. There’s a unique weighting on C 5C_5, giving weight 1/31/3 to each vertex. The magnitude is the sum of the weights, so |C 5|=5/3=1.66\left|C_5\right| = 5/3 = 1.66\ldots.

  • The maximum diversity is the independence number: D max(C 5)=α(C 5)=2D_{max}(C_5) = \alpha(C_5) = 2.

  • The Shannon capacity is Θ(C 5)=5=2.23\Theta(C_5) = \sqrt{5} = 2.23\ldots.

  • The Lovász number is the same: ϑ(C 5)=5=2.23\vartheta(C_5) = \sqrt{5} = 2.23\ldots.

It’s immediately clear that unlike the Lovász number, magnitude is not a helpful upper bound for the Shannon capacity — it’s not an upper bound at all. Indeed, it’s not even an upper bound for the independence number.

So for G=C 5G = C_5,

|G|D max(G)=α(G)Θ(G)ϑ(G). \left|G\right| \leq D_{max}(G) = \alpha(G) \leq \Theta(G) \leq \vartheta(G).

The equality and the second and third inequalities are true for all GG. I don’t know whether the first inequality is true in general— that is, whether the magnitude of a graph is always less than or equal to its maximum diversity. Or if not, is it at least true that |G|Θ(G)\left|G\right| \leq \Theta(G) for all GG? Again, I don’t know.

Nor do I know where to go from here. Most of my understanding of magnitude and maximum diversity comes from the context of metric spaces. (For a finite metric space XX, one uses the similarity matrix ZZ given by Z xy=e d(x,y)Z_{x y} = e^{-d(x, y)}.) For subspaces of Euclidean space, maximum diversity is always less than or equal to magnitude — which is the opposite way round from what we just saw. That’s because in Euclidean space, passing to a subspace always decreases the magnitude, and by the principle I mentioned earlier, the maximum diversity of a space is equal to the magnitude of some subspace.

Mark Meckes has done a great deal to illuminate the concept of maximum diversity and its relationship to magnitude, putting it into a satisfying analytical setting, not just for finite spaces but also for compact subspaces of Euclidean space (and more generally, so-called spaces of negative type). You can read about this in Positive definite metric spaces and a new paper — Edit: out now! Magnitude, diversity, capacities, and dimensions of metric spaces. But all that’s in a geometrical context quite unlike the graph-theoretic world.

I don’t know whether these ideas lead anywhere interesting. The root of the problem is that I don’t understand much about the magnitude of (the adjacency matrix of) a graph. If you have any insights or observations, I’d very much like to hear them.

Posted at August 18, 2013 1:52 PM UTC

TrackBack URL for this Entry:

16 Comments & 0 Trackbacks

Calculating the independence number

Although it doesn’t matter for this post, computing the independence number for an arbitrary “real world” graph in practice is a very interesting problem if you like that kind of thing. It also involves these inequalities, so if you’ll excuse the digression…

Most of the literature on the computing side discusses finding a maximum clique rather than a maximum independent set, so I’ll stick to that to reduce my chances of getting an inequality backwards. We denote this ω\omega, because it’s the “opposite” of α\alpha.

The best known way of finding a maximum clique in an arbitrary graph GG goes something like this: let vv be some vertex. Then any clique in GG either does not contain vv, or contains only vv and possibly some more vertices adjacent to vv. We use this to grow a clique, which is conventionally called CC. Initially CC is empty. We also have a set of vertices which could potentially be added to be added to CC. Conventionally this set is called PP, and initially it contains every vertex in the graph. Now, while PP isn’t empty, we pick a vertex vv from PP. Then we recursively solve the problem with vv added to CC and PP filtered to include only vertices adjacent to vv (and hence to everything else in CC), and then with vv in neither CC nor PP.

Here’s where the inequalities come in. We keep track of the biggest clique found so far, which is usually called the incumbent. At each step in the search, we want a quick way to know whether it’s worth continuing, i.e. whether we might be able to find a clique big enough to unseat the incumbent. One way of doing that is to look at |C|+|P||C| + |P|. But |P||P| isn’t a very good bound.

The best known algorithms right now use a graph colouring as a bound instead: if we can colour the vertices of PP using kk colours, PP cannot contain a clique with greater than k vertices, so if |C|+k|C| + k isn’t larger than the size of the incumbent then search at the current point can be abandoned. Calculating the minimum colour number χ\chi of a graph is also hard, though, but a greedy colouring χ *\chi^* can be computed very quickly.

There are other considerations (which vertex to select from PP, which greedy colouring to use, some clever tricks involving bitset encodings, a way of reducing the number of colourings that need to be done, parallelisation, discrepancy based search, …) which aren’t of interest here. But the use of a greedy colouring as a bound works because

ωχχ *|P|.\omega \le \chi \le \chi^* \le |P| .

The Lovász theta function (of the complement graph) is interesting from an algorithmic perspective because

ωθ¯χχ *|P|.\omega \le \overline{\theta} \le \chi \le \chi^* \le |P| .

Also, unlike χ\chi, θ¯\overline{\theta} can theoretically be computed to within ϵ\epsilon in polynomial time. This would suggest that using a nearest-integer approximation of θ¯\overline{\theta} as a bound would be better than a greedy colouring. Unfortunately calculating the theta function for an arbitrary graph in practice is an open problem…

It doesn’t help that all of these approximations can be arbitrarily bad, and if PNPP \ne NP then there’s no good polynomial time approximation for either ω\omega or χ\chi at all.

Still, it’s not all bad news: the smallest graph I’ve found where it’s not possible to compute the clique number exactly has 500 vertices and was designed specifically to screw up these kinds of algorithms. For reasonably sparse graphs we can take it to millions of vertices. So it’s not as horrific in practice as some hard graph problems.

Posted by: Ciaran McCreesh on August 18, 2013 8:48 PM | Permalink | Reply to this

Re: Calculating the independence number

Thanks for all that, Ciaran.

Although it doesn’t matter for this post, computing the independence number for an arbitrary “real world” graph in practice is a very interesting problem if you like that kind of thing.

Actually, I’m more interested in that than you might guess. In my post, I pointed out in passing that if one uses the most obvious algorithm for calculating the maximum diversity of an n×nn \times n matrix, it takes 2 n2^n steps. Obviously, it would be good to find a faster way.

Since calculating the independence number is a special case of calculating the maximum diversity (namely, the case where the matrix consists of just 0s and 1s), knowledge about the former problem is likely to be useful in attacking the latter problem. And fortunately, as your post makes clear, the problem of calculating the independence number is well studied.

(For anyone reading who doesn’t see the connection between the clique number and the independence number, here it is. Every graph GG has a complement G¯\overline{G}: the vertices are the same, but two vertices are adjacent in G¯\overline{G} exactly when they’re not adjacent in GG. A clique in a graph is a set of vertices each of which is adjacent to each other, and the clique number ω(H)\omega(H) of a graph HH is the largest cardinality of a clique in it. It’s immediate from the definition that α(G)=ω(G¯)\alpha(G) = \omega(\overline{G}): the independence number of a graph is the clique number of its complement.)

Christina Cobbold implemented the maximum diversity algorithm without any attempt at optimizing either the code or the algorithm itself, and on bog-standard hardware. I think it was done in Matlab. For 20-something species, it took only a couple of minutes to calculate the maximum diversity: for while 2 to the power of 20-something is large, each step is fast. Since no attempt at optimization was made, I’m sure this could be improved — and maybe you know how to.

But inevitably, there’s a limit to how much it can be improved while the algorithm still takes an exponential number of steps. That’s why I’m interested in finding a better algorithm. Can you see one?

Posted by: Tom Leinster on August 19, 2013 2:50 PM | Permalink | Reply to this

Re: Calculating the independence number

The bad news is, in the worst case, it’s likely that there is no polynomial time algorithm for the problem. Here’s why: independence number is NP-hard, so it has a polynomial time algorithm iff P=NPP = NP. But we don’t know of a polynomial time algorithm for any NP-hard problem, and it’s generally believed that PNPP \ne NP. Since a polynomial algorithm for maximum diversity could compute independence number in polynomial time, maximum diversity is also at least NP-hard.

There’s a small chance there’s a loophole. It could be that all interesting similarity matrices have some special extra property which simplifies the problem. For example, independence number is easy if you know that a graph consists of the disjoint union of some number of cliques: you construct a maximum independent set by picking a vertex, eliminating any adjacent vertices, and repeating until there’s nothing left.

If there’s not a loophole, the good news is that for some algorithms with exponential worst case behaviour, the worst case rarely happens, so wanting to compute maximum diversity for large matrices isn’t necessarily a lost cause.

The best algorithms for computing independence number in practice are branch and bound algorithms. The state of the art begins with three algorithms due to Tomita: 1, 2, 3. Then there’s a clever bitset encoding due to San Segundo: 1, 2. Prosser did a comparison of all of these, and published code. Finally, if you’ve got some really tricky graphs and access to expensive hardware, I’ve got code and a paper under review for a parallel version of these algorithms. It’s worth noting that these all use (the complement of) a greedy colouring as a bound, rather than the (complement of the) Lovász theta function.

As to whether these can be adapted to maximum diversity: maybe… San Segundo’s bitset encodings definitely can’t be, but branch and bound is worth considering. The key properties of independence number that make branch and bound work are:

  • There’s a link between independence number and independent sets. Suppose I claim that a graph has independence number at least aa. Then I can always prove it to you by showing you an independent set with aa elements, and you can verify this proof quickly. (Incidentally, if I claim a graph has independence number less than aa, there may sometimes not be a short proof of this fact if certain widely held assumptions about computational complexity are correct.) We need this because branch and bound assumes there’s a well-behaved decision problem (does graph GG contain an independent set of size aa?) associated with the optimisation problem (what is the largest independent set of GG?).

  • I can incrementally construct every independent set, and given an independent set I can easily tell you what the independence number would be if such an independent set is maximum (it’s just the cardinality of the independent set).

  • If I know the independence number of an induced subgraph, adding in an extra vertex from the original graph cannot decrease the independence number.

  • Suppose I have a set of independent vertices. Then I can easily select which of the remaining vertices in the graph could individually be added to this set (vertices which are not individually adjacent to anything in the set), and I can get a usually-good-in-practice upper bound on how much this could help.

  • The independence number of a graph doesn’t change if the vertices are permuted. (Not strictly necessary, but very convenient.)

If these properties, or something similar, also hold for maximum diversity then it’s likely there’s an algorithm which won’t behave exponentially in practice for most cases. As a first guess: is what you call a “good” subset of {1,,n}\{1, \ldots, n\} in some ways like being an independent set? In other words, are all subsets of a “good” subset also “good”? And does the magnitude of a “good” subset behave like the cardinality of an independent set?

Posted by: Ciaran McCreesh on August 20, 2013 6:23 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Here’s a couple of comments.

Firstly, when you’re using the adjacency matrix for your similarities, the notion of being totally identical is not transitive. So AA and BB can be totally identical (i.e. adjacent in some graph) and BB and CC can be totally identical, without AA and CC being totally identical, right?

Secondly, I seem to remember some slogan about Euler characteristic being preserved by left adjoint functors but not by right adjoint functors. So the Euler characteristic of a finite set, considered as a discrete topological space is its cardinality, but the cardinality of a topological space considered as a set is not equal to its Euler characteristic. Free functors are generally (by definition?) left adjoints. The usual metric space structure on a combinatorial graph can be thought of as the free [0,][0,\infty]-category on the underlying [0,][0,\infty]-graph(*). The underlying [0,][0,\infty]-graph is basically the adjacency matrix. I realise this could be random word association, but is there any obvious relation between the magnitude of the adjacency matrix and the magnitude of the metric space?

(*) A [0,][0,\infty]-graph is a set XX with a function X×X[0,]X\times X\to [0,\infty].

Posted by: Simon Willerton on August 20, 2013 9:22 AM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Firstly, when you’re using the adjacency matrix for your similarities, the notion of being totally identical is not transitive. So AA and BB can be totally identical (i.e. adjacent in some graph) and BB and CC can be totally identical, without AA and CC being totally identical, right?

Thanks — I was going to say something about that in my post, but decided against in order not to strain the reader’s attention too much. So I was secretly hoping that someone might bring it up.

Yes, right, being totally identical isn’t transitive. This isn’t as implausible as it sounds.

For instance, one of the many conflicting definitions of “species” is this: two individuals are of the same species if they can interbreed. (I guess we’re talking about sexual organisms here, and some caveats need to be added in order to avoid stupidities about infertile individuals or individuals of the same sex.) There’s no obvious reason why this should be an equivalence relation: perhaps gruffaloes can breed with buffaloes, and buffaloes with bison, but not gruffaloes with bison. In principle, you could have a ring of interbreeding organisms encircling the globe, with “holonomy”, if you see what I mean. Perhaps there are genetic reasons why this wouldn’t happen, but it doesn’t seem impossible from the definition alone.

Perhaps more convincingly, suppose we’re looking at microbial communities, where (as you know) one often has nowhere near a complete taxonomic classification of all the microbes present, and anyway it’s practically impossible to examine the microbes one by one. Instead, you extract their DNA and look at the variation. One might declare two individual microbes to be of the same species if they have a genetic similarity of >99\gt 99%, say. And this isn’t an equivalence relation.

(Microbial biologists are well aware of this, and have a bunch of tricks to make it an equivalence relation. This leads to the notion of an operational taxonomic unit.)

Secondly, I seem to remember some slogan about Euler characteristic being preserved by left adjoint functors but not by right adjoint functors […]

Interesting question. I don’t know, but it seems to have a lot to do with David’s ideas here.

My first thought is that passing from the adjacency matrix A GA_G to the similarity matrix Z GZ_G isn’t quite a free-type construction, because the metric doesn’t tell you how many paths of a given length there are between a given two points — just the length of the shortest one.

That seems to suggest that David’s P GP_G is the thing to consider. The formula for it, P G(q)= d=0 A G dq dP_G(q) = \sum_{d = 0}^\infty A_G^d q^d, even looks like the formula for a free monoid. Its magnitude at q=1q = 1 is the Euler characteristic of the graph GG, if we interpret the undirected graph GG as a directed graph with each undirected edge replaced by \stackrel{\longrightarrow}{\leftarrow}. (That’s very close to the little lemma in “The Euler characteristic of a category” saying that the EC of the free category on a directed graph is equal to the EC of that graph.)

On the other hand, A GqA_G q and P G(q)P_G(q) simply don’t have the same magnitude.

Posted by: Tom Leinster on August 20, 2013 11:21 AM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

In principle, you could have a ring of interbreeding organisms encircling the globe, with “holonomy”, if you see what I mean. Perhaps there are genetic reasons why this wouldn’t happen, but it doesn’t seem impossible from the definition alone.

This happens in a few ring species of birds that live around geographical barriers like the North Pole, Himalayas, or the Sierra Nevada.

Posted by: Stuart Presnell on August 22, 2013 8:17 AM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Thanks! From the Wikipedia article you link to:

Formally, the issue is that interfertility (ability to interbreed) is not a transitive relation

That’s exactly what I meant. The article also include an interesting quote from Richard Dawkins: ring species “are only showing us in the spatial dimension something that must always happen in the time dimension.”

Posted by: Tom Leinster on August 22, 2013 12:03 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

The real world, as always, is even more complicated, because “ability to interbreed” is not {0,1}\{0,1\}-valued. My understanding of ring species and the like is that, roughly, the farther apart two (fertile, opposite-sex) individuals live, the less successfully they can interbreed. For example, for animals in which the female lays large numbers of eggs which are then fertilized by the male, the number of eggs which are successfully fertilized decreases, such that near neighbors have a very high fertilization rate, but widely separated individuals, should they somehow happen to mate, have a fertilization rate of 0.

Posted by: Mark Meckes on August 23, 2013 4:23 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Hmm, the second half of my last comment was over-hasty: I wasn’t distinguishing properly between free categories and free [0,][0, \infty]-categories.

Here’s a basic question. Consider finite graphs, directed or not as you prefer. What’s the simplest way of associating a matrix M GM_G to each graph GG, in such a way that the magnitude of M GM_G is always equal to the Euler characteristic (vertices minus edges) of GG?

We already know one way, at least for directed graphs: take M GM_G to be the matrix of cardinalities of hom-sets of the free category on GG. But I’m wondering whether there’s some simpler way of doing it, not involving a free construction.

Posted by: Tom Leinster on August 20, 2013 4:30 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

If “Euler characteristic” means the trace of the identity map of a dualizable object in a symmetric monoidal category, then it is preserved by any strong monoidal functor. For some reason, though, left adjoints seem to have more of a tendency to be strong monoidal than right ones: strong/lax doctrinal adjunctions seem more common than oplax/strong ones.

Posted by: Mike Shulman on August 21, 2013 5:15 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Here’s a rather trivial observation, which may or may not be useful. The set of n×nn \times n similarity matrices is a compact convex set, and the adjacency matrices of graphs (in the sense of this post) with nn vertices are precisely its extreme points.

Posted by: Mark Meckes on August 24, 2013 10:45 AM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Interesting point.

Posted by: Tom Leinster on August 24, 2013 1:32 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

In my post, I said Mark had been working a lot on the relationship between magnitude and maximum diversity. He’s just posted a new paper about this (and several other things!), illuminating that relationship in the context of suitable compact metric spaces.

Mark Meckes, Magnitude, diversity, capacities, and dimensions of metric spaces. arXiv:1308.5407, 2013.

Posted by: Tom Leinster on August 27, 2013 2:11 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

I don’t know what you have learned about this topic over the last two years, but I have a connection between the magnitude of the adjacency matrix of a graph and the independence number.

A set of vertices in a graph is dominating if every vertex in the graph is either a member of the set or adjacent to a member. Any maximal independent set is a dominating set, so the minimum size of a dominating set in a graph is a lower bound for the maximum size of an indepedent set. The size of the smallest dominating set can be expressed as an integer linear program: min x n1 Tx\min_{x \in \mathbb{Z}^n} \mathbf{1}^T x such that x0x \geq \mathbf{0} and Ax1Ax \geq \mathbf{1}. The fractional dominating set number is the fractional relaxation of that program: min x n1 Tx\min_{x \in \mathbb{R}^n} \mathbf{1}^T x such that x0x \geq \mathbf{0} and Ax1Ax \geq \mathbf{1}. It is also equal to the value of the dual program max x n1 Tx\max_{x \in \mathbb{R}^n} \mathbf{1}^T x such that x0x \geq \mathbf{0} and Ax1Ax \leq \mathbf{1}. If there is some vector w0w \geq \mathbf{0} such that Aw=1Aw = \mathbf{1}, then ww is feasible in both the primal and dual linear programs and 1 Tw\mathbf{1}^T w is the fractional dominating set number. For dd-regular graphs, A1=(d+1)1A\mathbf{1} = (d+1)\mathbf{1} so both the fractional dominating set number and the magnitude are equal to 1/(d+1)1/(d+1).

The magnitude can differ from the fractional domination number. Consider the following seven vertex tree. The central vertex has degree two. Both of its neighbors have degree three. The final four vertices are leaves. The vector that is equal to 1/3-1/3 on the central vertex, 2/32/3 on its neighbors, and 1/31/3 on the leaves is a weighting so the magnitude is 7/37/3. The dominating set number and fractional dominating set number are both 22.

Posted by: Daniel Cullina on April 14, 2015 8:16 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Thanks for these interesting comments!

It is also equal to the value of the dual program max x n1 Tx\max_{x \in \mathbb{R}^n} \mathbf{1}^T x such that x0x \geq \mathbf{0} and Ax1A x \leq \mathbf{1}.

Because I’ve never done a course on linear programming, I didn’t know this, but Wikipedia is telling me that this is an instance of “strong duality” — and that in principle, the two occurrences of 1\mathbf{1} in the original problem have switched positions on passing to the dual problem.

For dd-regular graphs, A1=(d+1)1A\mathbf{1} = (d+1)\mathbf{1} so both the fractional dominating set number and the magnitude are equal to 1/(d+1)1/(d+1).

That must be a typo for n/(d+1)n/(d + 1), where nn is the number of vertices.

Here’s a picture of your example:

Tree with weighting shown

Posted by: Tom Leinster on April 15, 2015 6:03 PM | Permalink | Reply to this

Re: The Shannon Capacity of a Graph, 2

Yes, I meant to write n/(d+1)n/(d+1).

Not only have the two occurrences of 1\mathbf{1} switched places, one program uses AA and the other uses A TA^T. In general, the primal and dual polytopes live in different spaces and the AA matrix is not square, let alone symmetric. Fractional dominating set has an unusually symmetric linear program.

Posted by: Daniel Cullina on April 15, 2015 6:53 PM | Permalink | Reply to this

Post a New Comment