Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 22, 2025

Comagnitude 1

Posted by Tom Leinster

Next: Part 2

In this post and the next, I want to try out a new idea and see where it leads. It goes back to where magnitude began, which was the desire to unify elementary counting formulas like the inclusion-exclusion principle and the simple formula for the number of orbits in a free action of a group on a finite set.

To prepare the ground for comagnitude, I need to present magnitude itself in a slightly different way from usual. I won’t assume you know anything about magnitude, but if you do, watch out for something new: a connection between magnitude and entropy (ordinary, relative and conditional) that I don’t think has quite been articulated before.

The inclusion-exclusion formula tells us the cardinality of a union, which in categorical terms is a pushout. The set of orbits of a free action of a group GG on a finite set is the colimit of the associated functor BGFinSetB G \to FinSet. (Here BGB G is the one-object category whose maps are the elements of GG.) Generally, we’re going to think about the cardinality of the colimit of a functor X:AFinSetX: A \to FinSet, where AA is a finite category.

In both the examples I just mentioned, the cardinality |colimX||colim X| of the colimit is a \mathbb{Q}-linear combination of the cardinalities |X(a)||X(a)| of the individual sets X(a)X(a), taken over all objects aAa \in A. For inclusion-exclusion, we take

A=(012) A = (0 \leftarrow 1 \rightarrow 2)

and get

|colim(X)|=|X(0)||X(1)|+|X(2)|. |colim(X)| = |X(0)| - |X(1)| + |X(2)|.

For a free action of a group GG, we take A=BGA = B G. If the single object of AA is called aa, then

|colim(X)|=|X(a)|/o(G) |colim(X)| = |X(a)|/o(G)

— the cardinality of the set X(a)X(a) being acted on, divided by the order of GG.

Generally, there’s no hope of computing the cardinality of the colimit of a functor X:AFinSetX: A \to FinSet in terms of the cardinalities of the sets X(a)X(a) alone. The colimit usually depends on what the functor XX does to morphisms, as well as objects. So in order to pull this off, we’re going to have to confine ourselves to only those functors XX that are especially convenient. The hope is that for any AA, there are rational numbers w A(a)w_A(a) (aAa \in A) such that for all “convenient” functors X:AFinSetX: A \to FinSet,

|colim(X)|= aAw A(a)|X(a)|. |colim(X)| = \sum_{a \in A} w_A(a) |X(a)|.

Here’s how to simultaneously figure out what the coefficients w A(a)w_A(a) in this weighted sum must be and what “convenient” should mean.

The starting thought is that most notions of “nice enough” for set-valued functors include the representables. Since the colimit of any representable is the one-element set, for the last equation to hold for all representables A(b,)A(b, -) means that

1= aAw A(a)|A(b,a)| 1 = \sum_{a \in A} w_A(a) |A(b, a)|

for all bAb \in A.

And now we’re getting somewhere! This is a system of equations with the same number of equations as unknowns w A(a)w_A(a), namely, the number of objects of AA. And in this situation, there’s typically a unique solution, at least if we work over the field \mathbb{Q}.

A family (w A(a)) aA(w_A(a))_{a \in A} of rational numbers satisfying

aA|A(b,a)|w A(a)=1 \sum_{a \in A} |A(b, a)| w_A(a) = 1

for all bAb \in A is called a weighting on AA, and w A(a)w_A(a) is called the weight of aa. For the purposes of this post, I’ll assume there’s always exactly one weighting on AA. That’s not entirely justified: there are examples of finite categories with no weighting or finitely many. (See Examples 1.11 of my paper The Euler characteristic of a category, where all the stuff I’m talking about right now is worked out.) But it’s safe enough.

So, take the weighting w Aw_A on AA. By defintion, the class of functors X:AFinSetX: A \to FinSet satisfying

|colim(X)|= aAw A(a)|X(a)| |colim(X)| = \sum_{a \in A} w_A(a) |X(a)|

contains the representables. But it’s easy to see that it’s also closed under coproducts. So, this equation — which I’ll call the colimit formula — holds for all coproducts of representables!

Which functors ASetA \to Set are coproducts of representables? They’re sometimes called the “familially representable” functors, the idea being that a coproduct of representables iIA(a i,)\sum_{i \in I} A(a_i, -) is represented by a family of objects (a i) iI(a_i)_{i \in I}. But in my last post, I called them the “free functors”. The way I put it there was that the forgetful functor

Set ASet obA Set^A \to Set^{ob\ A}

has both a left and a right adjoint, and a functor XSet AX \in Set^A is free if and only if it’s in the image of the left adjoint. That means

X aAS(a)×A(a,) X \cong \sum_{a \in A} S(a) \times A(a, -)

for some family of sets (S(a)) aA(S(a))_{a \in A}.

Examples

  • When AA is the discrete two-object category {0,1}\{0, 1\}, both weights are 11, and the general colimit formula

    |colim(X)|= aw A(a)|X(a)| |colim(X)| = \sum_a w_A(a) |X(a)|

    reduces to something very basic:

    |X(0)+X(1)|=|X(0)|+|X(1)| |X(0) + X(1)| = |X(0)| + |X(1)|

    — the cardinality of a coproduct is the sum of the individual cardinalities. When AA is a discrete category, all functors ASetA \to Set are coproducts of representables.

  • When AA is the shape (012)(0 \leftarrow 1 \rightarrow 2) for pushouts, a functor X:AFinSetX: A \to FinSet is a diagram

    (X(0)X(1)X(2)) (X(0) \leftarrow X(1) \rightarrow X(2))

    of finite sets, and XX is a coproduct of representables if and only if both these maps are injections. Assume XX is a coproduct of representables. The weighting on AA is (1,1,1)(1, -1, 1), so the colimit formula says that the cardinality of a pushout is

    |X(0)||X(1)|+|X(2)| |X(0)| - |X(1)| + |X(2)|

    — the inclusion-exclusion formula.

  • For a group GG (or more generally, a monoid), the weight of the unique object of BGB G is 1/o(G)1/o(G), the reciprocal of the order. Given an action of GG on a a finite set, the corresponding functor BGFinSetB G \to FinSet is a coproduct of representables if and only if the action is free. In that case, the colimit formula says that the number of orbits is the cardinality of the set divided by o(G)o(G).

  • Finally, when A=(01)A = (0 \rightrightarrows 1), the colimit formula says that the coequalizer of X(0)X(1)X(0) \rightrightarrows X(1) has |X(1)||X(0)||X(1)| - |X(0)| elements, provided that the two functions from X(0)X(0) to X(1)X(1) are injective with disjoint images.

Aside for category theorists   Under finiteness conditions, and assuming that AA is Cauchy complete, a functor X:ASetX: A \to Set is a coproduct of representables if and only if it is flat with respect to the class of finite connected limits. This might seem more abstract than the original condition, but it actually gives a concrete, testable condition for XX to be a coproduct of representables. (See Lemma 5.2 and Definition 3.2 here.) This result enables one to see rather quickly that in all the examples above, the functors arising as coproducts of representables are exactly what I said they are.

 

Summary so far  We’ve seen that a finite category AA typically has a unique weighting w Aw_A — meaning a family (w A(a)) aA(w_A(a))_{a \in A} of rational numbers satisfying

a|A(a,b)|w A(a)=1 \sum_a |A(a, b)| w_A(a) = 1

for all bAb \in A. And we’ve seen that when X:AFinSetX: A \to FinSet is a coproduct of representables, the “colimit formula” holds:

|colim(X)|= aw A(a)|X(a)|. |colim(X)| = \sum_a w_A(a) |X(a)|.

Now, what if XX isn’t a coproduct of representables? The colimit formula won’t usually hold. But the right-hand side of this non-equation still calculates something. What is it?

Let me define the magnitude |X||X| of a functor X:AFinSetX: A \to FinSet by

|X|= aAw A(a)|X(a)|. |X| = \sum_{a \in A} w_A(a) |X(a)| \in \mathbb{Q}.

This is equal to |colim(X)||colim(X)| if XX is a coproduct of representables, but not in general.

For example, if we take a monoid MM acting on a finite set, the magnitude of the corresponding functor BMFinSetB M \to FinSet is the cardinality of the set divided by the order of MM. Unless the action is free, this isn’t usually the number of orbits, and it might not even be an integer.

The functor Δ1:AFinSet\Delta 1: A \to FinSet sending everything in AA to the one-element set has magnitude

|Δ1|= aAw A(a), |\Delta 1| = \sum_{a \in A} w_A(a),

which is called the magnitude or Euler characteristic of AA. (Under finiteness hypothesis, it’s equal to the topological Euler characteristic of the classifying space in AA.) Hence the magnitude of a category is a special case of the magnitude of a set-valued functor. Conversely, one can show that the magnitude of a functor X:AFinSetX: A \to FinSet is equal to the magnitude of its category of elements.

So, the concepts of magnitude of a category (already studied extensively) and the magnitude of a set-valued functor (a term I think I’m introducing for the first time here) are each special cases of the other. But in these posts, unlike everything I’ve written about magnitude before, I’m going to give prime position to magnitude of set-valued functors rather than of categories.

 

Does this definition of the magnitude of a set-valued functor have sensible properties? It does!

First, if X,Y:AFinSetX, Y: A \to FinSet are isomorphic then certainly |X|=|Y||X| = |Y|. Second, and a bit less trivially, magnitude is invariant under equivalence in the following sense: if we compose X:AFinSetX: A \to FinSet with an equivalence of categories G:BAG: B \to A then

|XG|=|X|. |X \circ G| = |X|.

In fact, this holds just as long as GG has a left adjoint.

The term “magnitude” suggests it should behave in a cardinality-like way, and it does that too. For example, given functors X,Y:AFinSetX, Y: A \to FinSet,

|X+Y|=|X|+|Y|, |X + Y| = |X| + |Y|,

where the left-hand side is the magnitude of the coproduct of our two functors. A bit more generally, suppose we have a diagram of functors and natural transformations

XYZ X \leftarrow Y \rightarrow Z

such that for each aAa \in A, the functions

X(a)Y(a)Z(a) X(a) \leftarrow Y(a) \rightarrow Z(a)

are injective with disjoint images (or more abstractly, the corresponding functor from (012)(0 \leftarrow 1 \rightarrow 2) to FinSetFinSet is a coproduct of representables). Then one can show that the magnitude of the pushout X+ YZX +_Y Z is given by

|X+ YZ|=|X||Y|+|Z|. |X +_Y Z| = |X| - |Y| + |Z|.

So magnitude in this sense obeys the inclusion-exclusion formula. More generally still, there’s a similar result for any shape of colimit, but I won’t write it out here. It’s exactly what the pushout case suggests.

 

I hope I’ve driven home the message that although the colimit formula

|colim(X)|=|X| |colim(X)| = |X|

for functors X:AFinSetX: A \to FinSet holds when XX is a coproduct of representables, it doesn’t hold for arbitrary XX. So perhaps it’s interesting to see how much the two sides of the equation differ when XX isn’t a coproduct of representables. In other words, let’s look at the difference

|colim(X)||X|=|colim(X)| aw A(a)|X(a)|. |colim(X)| - |X| = |colim(X)| - \sum_a w_A(a) |X(a)| \in \mathbb{Q}.

It seems to me that this quantity could be interesting. However, I don’t have a strong feel for what it means yet, so for now I’ll give it a bland name: the discrepancy of XX.

Example  The discrepancy of a coequalizer

X 0X 1C X_0 \rightrightarrows X_1 \to C

is

|C|(|X 1||X 0|)=|X 0||X 1|+|C|. |C| - (|X_1| - |X_0|) = |X_0| - |X_1| + |C|.

We’ve already seen that this is 00 if our two functions f,g:X 0X 1f, g: X_0 \rightrightarrows X_1 are injective with disjoint images. But in general, it looks rather like an Euler characteristic. In fact, from our coequalizer we can form a chain complex of free abelian groups

0X 0gfX 1hC0 0 \to \mathbb{Z} X_0 \stackrel{\mathbb{Z}g - \mathbb{Z}f}{\to} \mathbb{Z} X_1 \stackrel{\mathbb{Z}h}{\to} \mathbb{Z} C \to 0

where hh is the original function X 1CX_1 \to C. Then the discrepancy of our coequalizer is equal to the Euler characteristic of this complex.

This complex is exact except maybe at X 0\mathbb{Z} X_0. Let’s do a little check: if ff and gg are injective with disjoint images then gf\mathbb{Z}g - \mathbb{Z}f is injective too, so the complex is exact everywhere, which implies that the Euler characteristic is 00. Since the discrepancy is also 00 in this case, we’ve just confirmed one case of the result that the discrepancy of the coequalizer is the Euler characteristic of the resulting complex.

Example  For a pushout square

X 1 X 0 X 2 P, \begin{array}{ccc} X_1 &\rightarrow &X_0 \\ \downarrow & &\downarrow \\ X_2 &\rightarrow &P, \end{array}

the discrepancy is

|P|(|X 0||X 1|+|X 2|)=|X 1||X 0||X 2|+|P|. |P| - (|X_0| - |X_1| + |X_2|) = |X_1| - |X_0| - |X_2| + |P|.

I don’t know what to do with this formula, but it has a certain symmetry. Has anyone come across it before?

 

I’ll finish by explaining the entropy connection I promised at the start. Here we go!

 

Ordinary Shannon entropy  For this, we need to think about the Shannon entropy not just of probability measures — the usual setting — but of arbitrary measures. The sets we’re considering measures on will always be finite.

A probability measure pp on a finite set {1,,n}\{1, \ldots, n\} is just an nn-tuple (p 1,,p n)(p_1, \ldots, p_n) of nonnegative reals summing to 11, and its entropy H(p)H(p) is defined by

H(p)= ip ilogp i, H(p) = -\sum_i p_i \log p_i,

with the convention that 0log0=00\ \log\ 0 = 0. A measure a=(a 1,,a n)a = (a_1, \ldots, a_n) on {1,,n}\{1, \ldots, n\} is any nn-tuple of nonnegative reals, and the definition of entropy is extended from probability measures to arbitrary measures by homogeneity. That is, writing a=a i\|a\| = \sum a_i, the tuple a/aa/\|a\| is a probability measure, and we define

H(a)=aH(a/a) H(a) = \|a\| H(a/\|a\|)

(or H(a)=0H(a) = 0 if a=0\|a\| = 0). This is the unique way of extending the definition so that

H(λa)=λH(a) H(\lambda a) = \lambda H(a)

for all measures aa and real λ0\lambda \geq 0.

Let’s now consider measures a=(a 1,,a n)a = (a_1, \ldots, a_n) where each a ia_i is an integer. Such a thing amounts to a map

AπI A \stackrel{\pi}{\to} I

of finite sets. To see this, take I={1,,n}I = \{1, \ldots, n\}, and take AπIA \stackrel{\pi}{\to} I to be a function into II whose ii-fibre has cardinality a ia_i.

This categorification manoeuvre — upgrading from numbers to sets — brings maps into play. In particular, we get the monoid End(A)End(A) of all functions AAA \to A, and its submonoid End I(A)End_I(A) consisting of the endomorphisms of AA over II. That is, the elements of End I(A)End_I(A) are the functions f:AAf: A \to A such that πf=π\pi \circ f = \pi.

The inclusion of monoids

End I(A)End(A), End_I(A) \hookrightarrow End(A),

like any other monoid homomorphism, induces an action of the domain on the underlying set of the codomain: applying an element fEnd I(A)f \in End_I(A) to an element gEnd(A)g \in End(A) gives a new element fgEnd(A)f \circ g \in End(A). And this monoid action corresponds to a functor

End I(A)FinSet. End_I(A) \to FinSet.

(Or “BEnd I(A)FinSetB End_I(A) \to FinSet” if you want to be fussy, but I’ll drop the BBs.)

Question: what’s the magnitude of this functor?

Answer: it’s

|End(A)||End I(A)|=a a ia i a i= i(aa i) a i. \frac{|End(A)|}{|End_I(A)|} = \frac{\|a\|^{\|a\|}}{\prod_i a_i^{a_i}} = \prod_i \biggl( \frac{\|a\|}{a_i} \biggr)^{a_i}.

This is nothing but exp(H(a))\exp(H(a)), the exponential of the entropy of our measure! So:

The exponential of entropy is a special case of magnitude.

(Sometimes it seems that the exponential of entropy is a more fundamental quantity than the entropy itself. Apart from anything else, it doesn’t depend on a choice of base. I like to call the exponential of entropy the diversity.)

At least, that’s true when the measure of each point is an integer. But from knowing the entropy of such measures, we can easily obtain the entropy of those where the entropy of each point is rational, assuming the homogeneity property H(λa)=λH(a)H(\lambda a) = \lambda H(a). And then one can get the completely general case of real measures by extending by continuity.

The basic calculation above was described in an earlier post of mine, but at that time I didn’t see clearly what it meant.

 

Relative entropy  Given two probability measures pp and rr on the same finite set, you can calculate their relative entropy. It’s a measure of how surprised you’d be to observe a distribution of pp in a sample drawn from a population with distribution rr. (I think of rr as the “reference” distribution.) The formula for H(pr)H(p\|r), the entropy of pp relative to rr, is

H(pr)= ip ilog(p i/r i)[0,]. H(p\|r) = \sum_i p_i \log(p_i/r_i) \in [0, \infty].

Just as for ordinary entropy, we can extend it from probability measures to arbitrary measures. We’re now dealing with two measures aa and bb on the same finite set, and I’ll assume they have the same total mass: a=b\|a\| = \|b\|. Again, we extend homogeneously, so that

H(λaλb)=λH(ab) H(\lambda a \| \lambda b) = \lambda H(a\|b)

whenever λ0\lambda \geq 0.

And as before, we’ll focus on measures taking integer values. A pair of measures on a finite set II, both taking integer values and with the same total mass, amounts to a diagram

AI A \rightrightarrows I

in FinSetFinSet. The idea is that if we take I={1,,n}I = \{1, \ldots, n\} and write a ia_i and b ib_i for the cardinalities of the fibres of the ii-fibres of these two maps, then one of our measures is (a 1,,a n)(a_1, \ldots, a_n) and the other is (b 1,,b n)(b_1, \ldots, b_n). They have the same total mass because

a 1++a n=|A|=b 1++b n. a_1 + \cdots + a_n = |A| = b_1 + \cdots + b_n.

Calling our two maps π,ρ:AI\pi, \rho: A \to I, we can consider the set

Hom I(AπI,AρI) Hom_I(A \stackrel{\pi}{\to} I, A \stackrel{\rho}{\to} I)

of functions g:AAg : A \to A such that ρg=π\rho \circ g = \pi. This set is acted on by the monoid

End I(AπI) End_I(A \stackrel{\pi}{\to} I)

of functions f:AAf : A \to A such that πf=π\pi \circ f = \pi, by composition.

As we keep seeing, a monoid action corresponds to a functor from that monoid to SetSet, whose magnitude we can compute. In this case, the magnitude is

|Hom I(AπI,AρI)||End I(AπI)|=b i a ia i=(a ib i) a i, \frac{|Hom_I(A \stackrel{\pi}{\to} I, A \stackrel{\rho}{\to} I)|}{|End_I(A \stackrel{\pi}{\to} I)|} = \frac{\prod b_i^{a_i}}{\prod a_i} = \prod \biggl( \frac{a_i}{b_i} \biggr)^{a_i},

which is equal to

exp(H(ab)) \exp(-H(a \| b))

—the negative exponential of relative entropy.

(We’re never going to be able to get rid of that negative and get exp(H(ab))\exp(H(a \| b)) itself as a magnitude, since H(ab)H(a\|b) can be infinite but that the magnitude of a functor can’t be, at least in the context at hand.)

As for ordinary entropy, if we know relative entropy for integer-valued measures then it’s only two short steps to the definition for arbitrary measures.

 

Conditional entropy  Whereas relative entropy takes as its input a pair of measures on the same set, conditional entropy takes as its input two measures on different sets.

It’s usually presented in terms of random variables: we have random variables XX and YY taking values in sets II and JJ respectively (which for us will be finite), and the conditional entropy H(X|Y)H(X | Y) is defined by

H(X|Y)= iI,jJP(X=i,Y=j)log1P(X=i|Y=j). H(X | Y) = \sum_{i \in I, j \in J} P(X = i, Y = j) \log \frac{1}{P(X = i | Y = j)}.

If we write (p ij) iI,jJ(p_{i j})_{i \in I, j \in J} for the joint distribution — so that the p ijp_{i j} are nonnegative reals summing to 11 — then

H(X|Y)= i,jp ijlog(p jp ij). H(X | Y) = \sum_{i, j} p_{i j} \log \biggl( \frac{p_j}{p_{i j}} \biggr).

Here pp is a probability measure on I×JI \times J. But just as for ordinary and relative entropy, this definition extends to arbitrary measures aa on I×JI \times J, by scaling homogeneously.

To see how conditional entropy is a special case of magnitude, we follow a path that should be familiar by now.

An integer-valued measure on I×JI \times J amounts to a map AI×JA \to I \times J of finite sets, or equivalently a diagram

IAJ I \leftarrow A \rightarrow J

in FinSetFinSet. In the monoid of all endomorphisms of AA, we can consider the endomorphisms over II, or over JJ, or most restrictively of all, over I×JI \times J. In particular, there is an inclusion of monoids

End I×J(A)End J(A). End_{I \times J}(A) \hookrightarrow End_J(A).

As for ordinary entropy, this homomorphism induces an action of the domain on the underlying set of the codomain, giving a functor End I×J(A)FinSetEnd_{I \times J}(A) \to FinSet whose magnitude we can calculate. It’s

|End J(A)||End I×J(A)|= ja j a j i,ja ij a ij= i,j(a ja ij) a ij, \frac{|End_J(A)|}{|End_{I \times J}(A)|} = \frac{\prod_j a_j^{a_j}}{\prod_{i, j} a_{i j}^{a_{i j}}} = \prod_{i, j} \biggl( \frac{a_j}{a_{i j}} \biggr)^{a_{i j}},

where I’m using a ija_{i j} to mean the cardinality of the (i,j)(i, j)-fibre of AI×JA \to I \times J and a ja_j to mean the cardinality of the jj-fibre of AJA \to J. This is exactly the exponential of the conditional entropy.

 

Conclusion   The exponentials of

  • ordinary entropy
  • the negative of relative entropy
  • conditional entropy

all arise as special cases of magnitude — at least for integer-valued measures, but it’s routine to derive the definition for general measures from that special case.

 

If you know lots about entropy, you might be wondering now: what about mutual information? Is that the magnitude of some set-valued functor too?

I don’t know! Given AI×JA \to I \times J, the quantity we’d like to obtain as a magnitude is

|End(A)||End I×J(A)||End I(A)||End J(A)|. \frac{|End(A)| |End_{I \times J}(A)|}{|End_I(A)| |End_J(A)|}.

I can’t see how to do that in any natural way.

Maybe I’ll come back to this point next time, when I’ll start on the topic that’s actually the title of these posts: comagnitude.

Posted at January 22, 2025 9:45 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3585

22 Comments & 1 Trackback

Re: Comagnitude 1

This last formula for mutual information you’d like to obtain as a magnitude looks a lot like the exponential of the discrepancy of a pushout square — but you’ve probably noticed that already?

Posted by: Quentin Aristote on January 23, 2025 10:09 PM | Permalink | Reply to this

Re: Comagnitude 1

Absolutely! Mutual information can be seen as the codiscrepancy of a pullback square, which is what I was intending to mention next time.

I haven’t defined “codiscrepancy” yet, but it’s going to be something multiplicative. Whereas the discrepancy of a pushout square is an alternating sum of the four cardinalities, the codiscrepancy is an alternating product.

In terms of this kind of thing, discrepancy is more like Euler characteristic and codiscrepancy is more like homotopy cardinality.

I’ll get on to this properly next time. It’s a nice observation, but I’m still quite puzzled by the whole situation with mutual information and magnitude.

Posted by: Tom Leinster on January 23, 2025 10:24 PM | Permalink | Reply to this

Re: Comagnitude 1

In our paper The linearity of traces in monoidal categories and bicategories, Kate Ponto and I showed (Theorem 7.14) that if AA is a finite category with no nonidentity endomorphisms, then for any functor X:AFinSetX:A\to FinSet, then what you call the magnitude |X||X| is always equal to the Euler characteristic of the homotopy colimit hocolim(X)hocolim(X). So in this case, the discrepancy is measuring the difference between the colimit and the homotopy colimit, kind of like a lim 1lim^1.

Another way of saying this is that your observation about a chain complex in the case of coequalizers is generic. For any finite category AA with no nonidentity endomorphisms and any X:AFinSetX:A\to FinSet, there is a natural chain complex YY, namely the homotopy colimit in chain complexes of the diagram of free abelian groups X a\mathbb{Z}X_a (each regarded as a chain complex concentrated in degree 0), or equivalently the cellular chain complex of the homotopy colimit of XX in spaces, whose Euler characteristic is the magnitude |X||X|. Moreover, if CC is the actual colimit of XX, there is the canonical map YCY\to \mathbb{Z}C from the homotopy colimit to the colimit, which we can regard as augmenting YY to a larger chain complex, and the Euler characteristic of this larger chain complex (which I think is the “cone” of the map YCY\to \mathbb{Z}C) will be the discrepancy.

If AA has nonidentity endomorphisms, there is another contribution to the discrepancy. Suppose first A=BGA=B G for a finite group GG. Then the homotopy colimit of a finite GG-set XX in spaces, or of X\mathbb{Z}X in chain complexes over \mathbb{Z}, doesn’t have a finite Euler characteristic. But if we instead consider chain complexes over \mathbb{Q}, then the homotopy colimit of X\mathbb{Q}X is in fact equal to C\mathbb{Q}C, where C=X/GC = X/G is the ordinary colimit of XX in FinSetFinSet. Thus, the Euler characteristic of hocolim(X)hocolim(X) is equal to the cardinality |colim(X)||colim(X)|.

But this is still different from the magnitude of XX, which is |X(a)||G|\frac{|X(a)|}{|G|}, where aa is the unique object of BGB G so that X(a)X(a) is the underlying set with a GG-action. (In this case writing |X||X| is a bit ambiguous as to whether it means the cardinality of the GG-set XX or the magnitude of the functor X:BGFinSetX:B G \to FinSet.) In this case we have the orbit-counting theorem (a.k.a. Burnside’s lemma or the Cauchy-Frobenius lemma)

|colim(X)|= gG|X g||G| |colim(X)| \;=\; \sum_{g\in G} \frac{|X^g|}{|G|}

where X gX^g is the set of fixed points of gg acting on XX. Now note that if eGe\in G is the identity element, then X eX^e is the underlying set X(a)X(a). Thus, in this case the discrepancy is

|colim(X)||X(a)||G|= gG,ge|X g||G|. |colim(X)| - \frac{|X(a)|}{|G|} \;=\; \sum_{g\in G, g\neq e} \frac{|X^g|}{|G|}.

These two classes of examples can be combined into a theorem about EI-categories. If AA is a finite EI-category such that each group A(b,b)A(b,b) acts freely by composition on each hom-set A(a,b)A(a,b), then:

  1. There is a weighting on the category End(A)End(A) who objects are endomorphisms in AA (which are automorphisms), and whose morphism are commutative squares in AA,
  2. When restricted to the identity endomorphisms, this weighting becomes a weighting on AA, and
  3. For any X:AFinSetX : A \to FinSet, the Euler characteristic of hocolim(X)hocolim(X) is equal to gEnd(A)w(g)|X(a) g|. \sum_{g\in End(A)} w(g) \,{|X(a)^g|}.

This is Proposition 7.18 and Corollary 7.19 in our paper, while Theorem 9.1 includes a formula for this weighting. Therefore, in this case the discrepancy is

|colim(X)||X| =(|colim(X)||hocolim(X)|)+(|hocolim(X)||X|) =(|colim(X)||hocolim(X)|)+( gEnd(A)w(g)|X(a) g| aAw(id a)|X(a)|) =(|colim(X)||hocolim(X)|)+ gEnd(A),gidw(g)|X(a) g|. \begin{aligned} {|colim(X)|} - {|X|} &= \Big({|colim(X)|} - {|hocolim(X)|}\Big) + \Big({|hocolim(X)|} - {|X|}\Big)\\ &= \Big({|colim(X)|} - {|hocolim(X)|}\Big) + \Big(\sum_{g\in End(A)} w(g) \,{|X(a)^g|} - \sum_{a\in A} w(id_a) \,{|X(a)|} \Big)\\ &= \Big({|colim(X)|} - {|hocolim(X)|}\Big) + \sum_{g\in End(A), g\neq id} w(g) \,{|X(a)^g|}. \end{aligned}

That is, it combines the “homotopical discrepancy” |colim(X)||hocolim(X)|{|colim(X)|} - {|hocolim(X)|} with the “automorphism discrepancy” gEnd(A),gidw(g)|X(a) g|\sum_{g\in End(A), g\neq id} w(g) \,{|X(a)^g|}.

I don’t know how to generalize this point of view beyond EI-categories, though.

Posted by: Mike Shulman on January 24, 2025 4:37 AM | Permalink | Reply to this

Re: Comagnitude 1

On that last point, our paper does have formulas that go somewhat beyond the EI case, but it’s not clear to me whether they can be related to magnitude in a similar way. For instance, if AA is the free-living idempotent, then our formula says that the Euler characteristic of hocolim(X)hocolim(X) (which is just colim(X)colim(X) in this case) is the cardinality of the set of fixed-points of the idempotent (which is obvious, since the colimit is the set of fixed-points, but that’s what our formula specializes to). But in this case the magnitude of XX is just 12\frac{1}{2} of its cardinality.

Posted by: Mike Shulman on January 24, 2025 5:23 AM | Permalink | Reply to this

Re: Comagnitude 1

Ah, that’s very helpful. Thanks, Mike. I literally have it in my notes to look into that paper of yours and Kate’s, because I know from previous conversations that there’s lots of good and relevant stuff in there. But obviously I haven’t got round to it (sorry!). Actually, there have been several occasions in the last decade or so when I’ve wanted to read it, but I think what happens is that I open it, see “page 1 of 97”, gulp, and find something else to do. I should persevere! And the summary in your comment will really help.

Posted by: Tom Leinster on January 24, 2025 8:32 AM | Permalink | Reply to this

Re: Comagnitude 1

Well, that’s a perfectly normal reaction. Anyone who didn’t believe me about the value of writing short papers, take note.

I did also write a blog post and slides about that paper, which are shorter.

Posted by: Mike Shulman on January 24, 2025 7:43 PM | Permalink | Reply to this

Re: Comagnitude 1

Thanks for the links. I think you also explained some of this story to me in person. But I’m often quite slow to absorb things, as is apparent :-)

Posted by: Tom Leinster on January 24, 2025 7:52 PM | Permalink | Reply to this

Re: Comagnitude 1

I’m trying to understand the relationship between two theorems. On the one hand, there’s the theorem of yours and Ponto’s that you mention:

if AA is a finite category with no nonidentity endomorphisms, then for any functor X:AFinSetX: A \to FinSet, then what you call the magnitude |X||X| is always equal to the Euler characteristic of the homotopy colimit hocolim(X)hocolim(X).

On the other, there’s Proposition 2.18 of The Euler characteristic of a category, which states in part:

if AA is any finite category, then for any functor X:AFinSetX: A \to FinSet, the magnitude |X||X| is always equal to the magnitude of the category of elements (a.k.a. Grothendieck construction) 𝔼(X)\mathbb{E}(X).

(I’m silently assuming that all the relevant (co)weightings exist.)

Since the category of elements is a colax colimit, there should be a close relationship between these two results. And it seems to me that each implies the other, at least up to some difference in hypotheses.

The bridge between the two results is formed of two parts. Proposition 2.11 of the same paper says:

if AA is a finite category containing no endomorphisms or isomorphisms apart from identities, then χ(NA)=|A|\chi(N A) = |A|.

(The hypotheses on endomorphisms and isomorphisms guarantee that the Euler characteristic of the simplicial set NAN A is well-defined. And I’m consistently using bars to mean magnitude, not geometric realization.)

Also, Thomason’s homotopy colimit theorem (Theorem 1.2 of “Homotopy colimits in the category of small categories”) implies:

if AA is a small category, then for any functor X:ASetX: A \to Set, the simplicial sets hocolim(X)hocolim(X) and N(𝔼(X))N(\mathbb{E}(X)) are homotopy equivalent.

(Thomason’s result, your result with Ponto, and my result that |X|=|𝔼(X)||X| = |\mathbb{E}(X)| all hold for functors taking values in CatCat, not just SetSet. But I’m sticking to the simplest setting here.)

Since homotopy equivalent spaces have the same Euler characteristic, Thomason’s theorem implies that

χ(hocolim(X))=χ(N(𝔼(X))). \chi(hocolim(X)) = \chi(N(\mathbb{E}(X))).

If AA contains no endomorphisms or isomorphisms other than identities then neither does 𝔼(X)\mathbb{E}(X), so an equivalent statement is

χ(hocolim(X))=|𝔼(X)|. \chi(hocolim(X)) = |\mathbb{E}(X)|.

And this lets us see the equivalence between your and Ponto’s result that

|X|=χ(hocolim(X)) |X| = \chi(hocolim(X))

and my result that

|X|=|𝔼(X)|. |X| = |\mathbb{E}(X)|.

I imagine none of this is news to you, Mike: probably it’s all in your paper. And I see a different spin on this story in the last slide of the talk you linked to. But it helps me to write this out here.

Posted by: Tom Leinster on January 24, 2025 9:11 PM | Permalink | Reply to this

Re: Comagnitude 1

That’s nicely explained, thanks. We didn’t actually say anything about Grothendieck constructions in our paper, because we were focused more on enriched situations where they don’t make as much sense. And we only mentioned the magnitude of a category by-the-way; we were mainly interested in computing the Euler characteristics (and trace invariants) of homotopy colimits. But it’s neat that for ordinary finite categories, you can make this connection and point to your Proposition 2.11 as “what goes wrong” for finite categories that aren’t what we called “homotopy finite”.

For those who don’t want to click through to the slides, the “different spin” is sort of a converse: Tom’s Proposition 2.11 is also a special case of our theorem, since if X:AFinSetX:A\to FinSet is the terminal functor, then hocolim(X)=NAhocolim(X) = N A and |A|=|X|{|A|}={|X|}.

Posted by: Mike Shulman on January 28, 2025 8:53 AM | Permalink | Reply to this

Channel capacity is a coweighting

For the details, see section 4.3.1 of https://arxiv.org/abs/2304.08334.

Posted by: Steve Huntsman on January 24, 2025 2:33 PM | Permalink | Reply to this

Re: Comagnitude 1

It would definitely be nice to understand mutual information from this perspective. By following the inclusion-exclusion principle, one can define mutual information of higher orders, e.g., a tertiary mutual information among three random variables I(A;B;C)I(A;B;C) that rather resembles an Euler characteristic:

I(A;B;C)=H(A)+H(B)+H(C)H(A,B)H(B,C)H(C,A)+H(A,B,C). I(A;B;C) = H(A) + H(B) + H(C) - H(A,B) - H(B,C) - H(C,A) + H(A,B,C).

“Vertices” minus “edges” plus a “face”, so to speak.

This has unusual properties. Notably, it can go negative. Suppose that AA, BB and CC are binary variables whose values are set by picking a row of the XOR truth table at random. Then

H(A)=H(B)=H(C)=1, H(A) = H(B) = H(C) = 1,

whereas

I(A;B)=H(A)+H(B)H(A,B)=0, I(A;B) = H(A) + H(B) - H(A,B) = 0,

and so

I(A;B;C)=1. I(A;B;C) = -1.

One approach I pursued for a while was to say that each random variable is really a set of binary questions, so the entropy is the cardinality of that set; to have nonintegral values of the entropy, we’d say that the binary questions are interdependent in a way described by a groupoid, and then take the groupoid cardinality.

Even weirder stuff happens when the random variables take more than 2 values. There are multipartite correlations to which Shannon-style information measures are blind, but which can be teased out by introducing auxiliary systems.

Posted by: Blake Stacey on January 24, 2025 10:53 PM | Permalink | Reply to this

Re: Comagnitude 1

The alternating sum is the interaction information

https://en.wikipedia.org/wiki/Interaction_information

Posted by: Steve Huntsman on January 26, 2025 7:31 PM | Permalink | Reply to this

Re: Comagnitude 1

Really interesting post. I still need some time to fully process what you’ve said at the very end concerning entropy, but for the moment I wanted to say two things:

First, I wanted to remark that you can also see the discrepancy of your pushout square as an Euler characteristic; namely, of the complex 0X 1X 0X 2P0,0 \to \mathbb{Z} X_1 \to \mathbb{Z} X_0 \oplus \mathbb{Z} X_2 \to \mathbb{Z} P \to 0, where the map X 1X 0X 2\mathbb{Z} X_1 \to \mathbb{Z} X_0 \oplus \mathbb{Z} X_2 is the direct sum of the two maps with source XX in the square and the map X 0X 2P\mathbb{Z} X_0 \oplus \mathbb{Z} X_2 \to \mathbb{Z} P is the difference of the other two in the square. That said, Mike Shulman already made a much more general statement in this direction.

The second point concerns an alternative way of obtaining the entropy that I think has a more direct information-theoretic interpretation. Let a=(a 1,...,a n)a=(a_1,...,a_n) be a vector of integers and π:AI\pi:A\to I a map between the sets A={1,...,a}A=\{1,...,\lVert a\rVert\} and I={1,...,n}I=\{1,...,n\} such that π 1(i)=a i\pi^{-1}(i)=a_i. One can regard a choice of π\pi as a word of length a\lVert a \rVert with symbols in II: under this interpretation, the elements of AA are positional indices, and π(a)=i\pi(a)=i if ii is the symbol in position aa.

The group Aut(A)\operatorname{Aut}(A) gives you all the possible permutations of the indices; if AA is ordered, this really corresponds to all the ways of reordering the characters in the word while preserving the frequency of each symbol iIi\in I. One can also consider the group Aut I(A)\operatorname{Aut}_I(A): in this case, you only permute positions that are associated to the same symbol; from the perspective of “reading” ww these transformations are not noticeable. Since Aut I(A)\operatorname{Aut}_I(A) is a subgroup of Aut(A)\operatorname{Aut}(A), one can again build a functor X:BAut I(A)FinSetX: B \operatorname{Aut}_I(A) \to \operatorname{FinSet} and its magnitude is |X|=a! i=1 na i!|X| = \frac{\lVert a \rVert !}{\prod_{i=1}^n a_i !} i.e. a multinomial coefficient. In the language of information theory, this is the cardinality of the “type class” T a/a aT^{\lVert a\rVert}_{a/\lVert a\rVert}, which is the set of words wI aw\in I^{\lVert a \rVert} such that each iIi\in I appears with frequency a i/aa_i/\lVert a\rVert. In this “type theory”, entropy appears asymptotically, by taking a i/ap ia_i/\lVert a\rVert \to p_i\in \mathbb{Q} as a\lVert a \rVert \to \infty for each iIi\in I: then, because of Stirling’s formula, |X|exp(nH(p)+o(n))|X| \to \exp( n H(p) + o(n)). See, for instance, Chapter 2 in the book “Information theory: Coding theorems for discrete memoryless systems” by Körner and Csiszár.

For me, the nice thing about this story is that it relates entropy, a measure of uncertainty, to ambiguity in the group-theoretic sense. There are variations of it, too. Namely, in https://arxiv.org/abs/1807.05152 , I’ve considered the quotient of GL a(𝔽 q)\operatorname{GL}_{\lVert a \rVert}(\mathbb{F}_q) by the parabolic subgroup PP that stabilizes a given flag V 1V 2...V nV_1 \subset V_2 \subset ... \subset V_n such that dim(V i)= j=1 ia j\operatorname{dim}(V_i) = \sum_{j=1}^i a_j; the cardinality |G/P|=|G|/|P||G/P| = |G|/|P| is a qq-binomial coefficient, which asymptotically yields the Tsallis 22-entropy. I wonder if one can obtain this quotient as the magnitude of a functor too.

Posted by: Juan Pablo Vigneaux on January 25, 2025 5:22 AM | Permalink | Reply to this

Re: Comagnitude 1

Sorry, I realized there’s a typo, because my previous comment should say |π 1(i)|=a i|\pi^{-1}(i) |=a_i. But these a ia_i are not elements of AA, so the notation is a bit confusing.

It’d be better to denote by m=(m 1,...,m n)m=(m_1,...,m_n) the vector of integers. Then π:AI\pi:A\to I is any map such that |π 1(i)|=m i|\pi^{-1}(i)| = m_i, for each iIi\in I.

If AA is totally ordered, A={1,...,m}A= \{1,...,\lVert m\rVert\}\subset\mathbb{N}, then a possible choice of π\pi sends jAj\in A to iIi\in I iff i=1 j1m j+1j i=1 jm j\sum_{i=1}^{j-1} m_j +1 \leq j \leq \sum_{i=1}^{j} m_j. As a word, it consists of m 1m_1 ones, followed by m 2m_2 twos, etc. Any other word with the same frequency of each symbol in II (i.e. of the same type) can be obtained by acting on π\pi with an element of Aut(A)\operatorname{Aut}(A); since the stabilizer of π\pi is isomorphic to Aut I(A)\operatorname{Aut}_I(A), the size of the orbit (i.e. the type class) is the quotient |Aut(A)|/|Aut I(A)||\operatorname{Aut}(A)|/|\operatorname{Aut}_I(A)|.

Posted by: Juan Pablo Vigneaux on January 26, 2025 12:48 AM | Permalink | Reply to this

Re: Comagnitude 1

Sorry, coming late to this comment.

I like this way of looking at entropy via the type class formula (and I seem to remember Dan Piponi singing its praises somewhere on the internet).

It would be good if it was possible to explain the information-theoretic relevance of End I(A)End_I(A) in as convincing a way as the explanation you gave of Aut I(A)Aut_I(A).

Posted by: Tom Leinster on February 11, 2025 9:44 PM | Permalink | Reply to this

Re: Comagnitude 1

I can’t see how to do that in any natural way.

As you wrote, there is an action of End I×J(A)End_{I \times J}(A) on all of End I(A)End_{I}(A), End J(A)End_{J}(A) and End(A)End(A). Can one combine these actions in some way to obtain the desired magnitude? I don’t know what the magnitude of the (functor to FinSetFinSet coming from) the action of End I×J(A)End_{I \times J}(A) on End I(A)×End J(A)×End(A)End_{I}(A) \times End_{J}(A) \times End(A) is for example?

Posted by: product action on January 26, 2025 12:02 AM | Permalink | Reply to this

Re: Comagnitude 1

For any action of a monoid MM on a set SS (both finite), the magnitude of the resulting functor BMSetB M \to Set is |S|/|M||S|/|M|, the cardinality of SS divided by the cardinality of MM. So in this case, the action you mention would give a magnitude of

|End I(A)||End J(A)||End(A)||End I×J(A)|, \frac{{|End_I(A)|}{|End_J(A)|}{|End(A)|}}{{|End_{I \times J}(A)|}},

which isn’t quite what we want.

Posted by: Tom Leinster on January 26, 2025 10:04 PM | Permalink | Reply to this

Re: Comagnitude 1

Thank you for your patience! One could try to construct an action of End I(A)×End J(A)End_{I}(A) \times End_{J}(A) on End I×J(A)End_{I \times J}(A) (which seems to be the crucial point if one hopes for the same approach to work in a simple way) for which the application of (f,g)(f,g) to hh is something like ghfg \circ h \circ f. One has that the following diagram commutes, which seems vaguely encouraging, but it doesn’t seem to actually help much unless I=JI = J, or one has some other additional assumptions.

If one had some map A×AAA \times A \rightarrow A over I×JI \times J one might also be able to do something.

I guess that you already had tried all this though!

Posted by: product action on January 27, 2025 12:49 PM | Permalink | Reply to this

Re: Comagnitude 1

Did you maybe omit some assumptions in what you stated? I mean the idea I suggested does define an action of End I(A) op×End J(A)End_I(A)^op \times End_J(A) on End(A)End(A) (I was missing an op originally), and there is a trivial action of End I(A) op×End J(A)End_I(A)^op \times End_J(A) on End I×J(A)End_{I \times J}(A), so combining these one gets an action of End I(A) op×End J(A)End_I(A)^op \times End_J(A) on End(A)×End I×J(A)End(A) \times End_{I \times J}(A). According to what you stated, the magnitude of the corresponding functor would be what you’re looking for. But this surely can’t be right?

Posted by: product action on January 28, 2025 9:43 PM | Permalink | Reply to this

Re: Comagnitude 1

Thanks for the ideas! It’s entirely possible I’ve overlooked something straightforward, but I don’t think I’ve omitted any assumptions.

A couple of thoughts. First, we can certainly get the thing we want —

|End(A)||End I×J(A)||End I(A)||End J(A)|. \frac{|End(A)| |End_{I \times J}(A)|}{|End_I(A)| |End_J(A)|}.

— as the magnitude of a monoid action, because we can just take the trivial action of the monoid End I(A)×End J(A)End_I(A) \times End_J(A) on the set End(A)×End I×J(A)End(A) \times End_{I \times J}(A). But of course, a trivial answer like that isn’t going to help anything. Maybe that counts as an omitted assumption. What you’re mentioning isn’t a trivial action, but half of it is.

The difficulty is to come up with any nontrivial action on End I×J(A)End_{I \times J}(A). Anyway, I hope to write part 2 soon, which will have another approach to all this, as hinted here.

Posted by: Tom Leinster on January 29, 2025 8:25 AM | Permalink | Reply to this
Read the post Comagnitude 2
Weblog: The n-Category Café
Excerpt: Thinking about the cardinality of limits leads to a new numerical invariant of set-valued functors.
Tracked: January 29, 2025 11:17 PM

Re: Comagnitude 1

Regarding your formula for the discrepancy of the pushout, I’m reminded of the formula P(ab)+P(ab)=P(a)+P(b)P(a \vee b) + P(a \wedge b) = P(a) + P(b). This has the shape of the discrepancy formula with the discrepancy set to 0, for the “square” between A×BA \times B and A+BA + B (with other vertices at A and B). This is entirely heuristic and might not make any sense, but I have seen quite a few variations on this shape of formula and never could quite figure out a way to connect them beyond being analogous to some form of inclusion/exclusion.

The idea of these “squares with 0 discrepancy” is something that I have thought about before this post, but maybe the notion of discrepancy will be enough to categorically formulate something concrete besides me just seeing things in a few different fields that look similar :)

Posted by: Sofia B on February 2, 2025 4:48 PM | Permalink | Reply to this

Re: Comagnitude 1

Is a Categorified Two Cultures Challenge on the cards?

For X,Y:AFinSetX, Y: A \to FinSet, and FF a natural transformation between them, is there some inequality

|X× FX||X| 2/|Y|?|X \times_F X| \geq |X|^2/|Y|?

Posted by: David Corfield on February 15, 2025 12:54 PM | Permalink | Reply to this

Post a New Comment