### Large Sets 12

#### Posted by Tom Leinster

*Previously: Part 11. Next: Part 12.5*

Today’s topic is replacement. Replacement is not
directly *about* large sets, but it does *imply* that certain large sets
exist.

Even among those who are familiar with and sympathetic to categorical set theory, I think there’s a lingering impression that replacement is somehow borrowed from ZFC. If categorical set theory is supposed to stand on its own two feet, without having to lean on membership-based set theory for conceptual motivation, then perhaps there are those who believe that to supplement ETCS with replacement would be an embarrassing admission of defeat.

I’ll explain why this is a misconception, stating replacement in a way that’s entirely natural from a structural/categorical perspective. The form of replacement I’ll use is due to Colin McLarty, who wrote of it “Our axiom is not a translation from ZF. It is a plain categorical version of Cantor’s idea.”

Let’s begin by thinking about families. In categorical set theory, a family of sets indexed by a set $I$ is usually defined as a set $X$ together with a function $p: X \to I$. Think of the fibre $p^{-1}(i)$ as the $i$th member of the family. In brief:

A family over $I$ is a function into $I$.

In what follows, it won’t be of the slightest importance whether a family
over $I$ literally *is* a function into $I$ — whether that’s the
actual *definition* of family — as long as there’s an *equivalence*
between families over $I$ and functions into $I$.

Now, there’s another, genuinely different, conception of what a family is. Consider sequences, the case $I = \mathbb{N}$. Since we can form the power set $\mathcal{P}(X) = 2^X$ of any set $X$, we might intuitively feel that there should be a sequence of sets

$\mathbb{N}, \mathcal{P}(\mathbb{N}), \mathcal{P}\mathcal{P}(\mathbb{N}), \ldots.$

After all, if I give you any natural number (7, say) then you can give me the corresponding member of this sequence (namely, $\mathcal{P}\mathcal{P}\mathcal{P}\mathcal{P}\mathcal{P}\mathcal{P}\mathcal{P}(\mathbb{N})$). So we’ve got a sequence of sets, right? Right?

Well, no. At least, not on the basis of ETCS alone. In order for these sets to form a sequence — meaning an $\mathbb{N}$-indexed family — we have to have a set $X$ and a function $p: X \to \mathbb{N}$ whose $n$th fibre is the $n$-fold iterated power set of $\mathbb{N}$. In the notation explained in Part 6, the $n$th fibre is the beth $\beth_n$, and $X$ itself is $\beth_\omega$. And as shown in Part 6, it is consistent with ETCS that $\beth_\omega$ does not exist. In other words, it cannot be proved from ETCS that there is any function into $\mathbb{N}$ whose fibres are the iterated power sets of $\mathbb{N}$ (unless, of course, ETCS is inconsistent).

If we don’t like this — if we feel that it’s a shortcoming of ETCS — then we might consider adding to ETCS some kind of principle saying that whenever we have a set $I$ and a procedure or algorithm or formula specifying a set $X_i$ for each $i \in I$, then the sets $X_i$ form an $I$-indexed family in the official sense. And that principle is exactly replacement.

The version of replacement I’m going to state here is due to Colin McLarty, from Section 8 of his wonderful paper Exploring categorical structuralism (2004).

In a brief attempt to be a conscientious scholar, let me mention some historical context. McLarty was drawing on a replacement-like reflection principle formulated by Gerhard Osius in 1973. Further back in the history are a reflection principle stated in the long version of Lawvere’s ETCS paper, as well as work of one J. C. Cole that I’ve never managed to lay my hands on. And there’s also at least one more recent categorical version of replacement, given in Section 5 of Mike Shulman’s 2010 paper on stack semantics.

Anyway, it’s McLarty’s version I know best, so it’s McLarty’s version I’m going to explain here. I think it has strong intuitive appeal.

Loosely, replacement says the following.

Axiom scheme of replacement (informal)Suppose we have a set $I$ and a first-order formula specifying a set $F(i)$ up to isomorphism for each $i \in I$. Then there exists a function into $I$ with fibres $F(i)$.

To state it more carefully, I need to recall the way I set up ETCS back in Part 1:

we have some things called “sets”, some things called “functions from $X$ to $Y$” for each set $X$ and set $Y$, and an operation of “composition” on functions, satisfying the following axioms: […]

We’re going to consider first-order formulas in the language of ETCS. Such formulas are built up in the usual way from variable symbols (with specified types), logical connectives such as “or” and “not”, brackets, and so on. Two kinds of quantification are allowed:

quantification over sets;

for given sets $X$ and $Y$, quantification over functions from $X$ to $Y$.

One kind of equality is allowed:

- for given sets $X$ and $Y$, equality between functions from $X$ to $Y$.

In particular, we can’t express equality between sets. But we can express isomorphism. Let’s recite the definition: “$X$ and $Y$ are isomorphic if there exist $f: X \to Y$ and $g: Y \to X$ such that $g \circ f = 1_X$ and $f \circ g = 1_Y$.” That definition uses only the kinds of quantification and equality that we’ve allowed. So, it’s expressible.

Notice that given a set $X$, we can quantify over the elements of $X$, because of the correspondence between elements of $X$ and functions $1 \to X$.

(A technical aside: the typing I’ve used to state ETCS is a bit different from McLarty’s, which means that replacement has to look a bit different too. But the difference is purely cosmetic.)

Now we want to formalize the idea of a “function” that assigns a set $F(i)$ (determined up to isomorphism) to each element $i$ of a set $I$. We’ll do this by considering the relation $\Phi(i, S)$ given by $\Phi(i, S) \iff F(i) \cong S$.

So, take a formula $\Phi = \Phi(i, S)$ whose only free
variables are $i$ (of type “element of $I$”) and $S$ (of type “set”). Let’s say that
$\Phi$ is **functional** on $I$ if for every $i \in I$, there exists a set
$S$ with the following property:

whenever $S'$ is a set, $\Phi(i, S') \iff S' \cong S$.

Replacement says this:

Axiom scheme of replacementLet $I$ be a set and $\Phi$ a functional formula on $I$. Then there exist a set $X$ and a function $p: X \to I$ such that $\Phi(i, p^{-1}(i))$ for all $i \in I$.

And that’s it. If we have a “family” of sets specified by some formula in the language of ETCS, then it arises as the family of fibres of some function.

AsideI’ve said “axiom scheme” rather than “axiom” only so that logicians won’t think I’m a total heathen. In ordinary mathematical language, replacement is an axiom. But if we’re being careful about the logical structure, it’s an infinitecollectionof axioms, one for each $\Phi$. We can’t combine them into a single statement, because quantifying over formulas — “$\forall\Phi$” — is not allowed in our formal language.(Joachim Kock once half-jokingly suggested that we should have a new quantifier, “for each”, for this kind of purpose. “For each $\Phi$” is different from “for all $\Phi$”.)

In the introduction to this post, I quoted from the paper by Colin McLarty that introduced replacement in this form. Here he is at more length, from p.48-49 of “Exploring categorical structuralism”, talking about his ETCS version of the axiom scheme of replacement.

Probably few readers think this looks like the ZF axiom scheme. That scheme has no homophonic translation into ETCS as it uses membership so heavily. It has a tree-theoretic translation, since every ZF formula does, but you can see that our axiom does not deal in trees. Our axiom is not a translation from ZF. It is a plain categorical version of Cantor’s idea.

Hellman finds the motivation for the axiom scheme “remains characteristically set theoretic” […] Indeed. Categorical set theory is a theory of sets. Its motives are set theoretic.

We can ask: are there motives for the axiom scheme of replacement apart from the ZF version of the scheme, and more generally apart from Zermelo’s form of membership-based set theory? But that is obvious. Cantor did not have Zermelo’s membership-based conception. […] And replacement is Cantor’s idea. […] Hellman rightly suggests we have no “new motivation” for the axiom. […] Neither did Fraenkel or Skolem. We all have Cantor’s original motive which predates the membership-based ZF version by decades.

The rest of this post is mostly about two things: what replacement *can*
do, and what it *can’t* do.

#### What can replacement do?

A first, easy, consequence is the principle of separation or specification (or comprehension, as some people call it for reasons I don’t comprehend). Loosely, this says that given a set $I$ and a reasonable property $\phi$ that an element of $I$ may or may not have, we can form a subset

$\{ i \in I : \phi(i) \ \text{is true} \}$

of $I$. Interpret “reasonable property $\phi$” to mean “formula $\phi(i)$ with one free variable $i$ of type $I$”. Then there is indeed such a subset. For given such a formula $\phi = \phi(i)$, define a formula $\Phi = \Phi(i, S)$ by

$\Phi(i, S) \ \iff\ (\phi(i) \ \text{is true and}\ S \cong 1) \ \text{or}\ (\phi(i) \ \text{is false and}\ S \cong \emptyset).$

By replacement, there exist a set $X$ and a function $X \to I$ whose $i$th fibre has one element if $\phi(i)$ is true and none if $\phi(i)$ is false. In other words, $X$ is the subset of $I$ consisting of the elements satisfying $\phi$. And that’s what we wanted.

(That proof of separation is from Section 9 of McLarty’s paper.)

Now let’s go back to the motivating example from the start of the post: the wannabe sequence

$\beth_0 = \mathbb{N}, \ \beth_1 = \mathcal{P}(\mathbb{N}), \ \beth_2 = \mathcal{P}\mathcal{P}(\mathbb{N}), \ \ldots.$

Assuming replacement, we’ll show that this really *is* a sequence in the
ETCS sense: there is a set $X$ with a map $X \to \mathbb{N}$ whose
fibres are the beths $\beth_n$.

There’s a first-order formula $\Phi = \Phi(n, S)$ in natural numbers $n$ and sets $S$, such that

$\Phi(n, S) \ \iff\ \beth_n \cong S.$

One way to see this is to write

$[n] = \{ i \in \mathbb{N}: i \leq n \} = \{0, \ldots, n\},$

then define $\Phi(n, S)$ to mean “there exist a set $Y$ and a function $q: Y \to [n]$ such that $q^{-1}(0) \cong \mathbb{N}$, and $q^{-1}(i + 1) \cong 2^{q^{-1}(i)}$ for all $0 \leq i \lt n$, and $q^{-1}(n) \cong S$.”

The plan now is to use replacement to conclude that there exist a set $X$ and a function $p: X \to \mathbb{N}$ such that $p^{-1}(n) \cong \beth_n$ for all $n \in \mathbb{N}$. In order to do so, we have to show that $\Phi$ is functional — that is, for all $n \in \mathbb{N}$, there is a unique-up-to-iso set $S$ satisfying $\Phi(n, S)$. And at this point we meet a subtlety.

Intuitively, we’d like to say “it’s true by induction”. After all, suppose we have some $n$ and we can construct the unique $S$ satisfying $\Phi(n, S)$, namely, $\beth_n$. Then we can construct the unique $S'$ satisfying $\Phi(n + 1, S')$: it’s $2^{\beth_n}$. But in order to use induction, we have to know that

$\{ n \in \mathbb{N}: \text{there is a unique-up-to-iso set } \ S\ \text{such that}\ \Phi(n, S)\}$

is actually a *subset* of $\mathbb{N}$. In other words, there need to exist
a set $M$ and an injection $M \to \mathbb{N}$ such that for $n \in
\mathbb{N}$,

$n \in M \ \iff\ \text{there is a unique-up-to-iso set } \ S\ \text{such that}\ \Phi(n, S).$

That’s not obvious! But it’s true, assuming replacement: it’s an instance of separation. And once we have this set $M$, we can use induction to prove that $M = \mathbb{N}$, then use replacement to construct our set over $\mathbb{N}$ with fibres $\beth_0, \beth_1, \ldots$. So everything goes through fine.

In summary, the subtlety was that to construct the sequence $(\beth_n)_{n
\in \mathbb{N}}$, we had to use replacement *twice*.

The “total space” of the family $(\beth_n)$ — I mean, the domain $X$ of the map $X \to \mathbb{N}$ whose fibres are $\beth_0, \beth_1, \ldots$ — is $\beth_\omega$. So, we’ve used replacement to construct $\beth_\omega$. A similar argument shows that assuming replacement, we can construct $\beth_W$ for every well-ordered set $W$. So all beths exist.

Not only does replacement imply that all beths exist, it implies that beth *fixed points* exist
— unboundedly many of them, in fact. Recall from Part
7 that
$I(X)$ denotes the initial well-order on a set $X$, and that $X$ is said to
be a **beth fixed point** if $\beth_{I(X)} \cong X$.

Also in Part 7, I described a procedure for constructing a beth fixed point bigger than any given set $A$. Roughly speaking, what we do is to define a sequence of sets $(B_n)_{n \in \mathbb{N}}$ by

$B_0 = A, \ B_{n + 1} = \beth_{I(B_n)},$

and then take its sup, which can be shown to be a beth fixed point.

I say “roughly speaking” because this depends on this sequence actually
existing, meaning that there exist a set $X$ and a map $p: X \to
\mathbb{N}$ with $p^{-1}(n) \cong B_n$ for all $n$. (If it does exist then
there’s no problem with taking the sup: the sets $B_n$ have $X$ as an upper
bound, so they must have a *least* upper bound.) That’s where replacement
is needed.

To apply replacement to this problem, we write down a first-order formula $\Psi(n, S)$ in natural numbers $n$ and sets $S$ such that

$\Psi(n, S) \ \iff\ B_n \cong S,$

using the same kind of manoeuvre with sets over $[n]$ as we used for the sequence $(\beth_n)$. And then a similar argument, using replacement twice, gives us the sequence $(B_n)$, hence a beth fixed point larger than $A$.

Once you’ve done a few proofs like this, it becomes clear that they should all be instances of a single theorem. That theorem is the principle of transfinite recursion.

There are several versions of transfinite recursion. To explain the one I’m talking about here, I’ll warm up with recursion over $\mathbb{N}$.

Suppose we have some kind of machine $G$ that takes as its input a finite sequence $(X_0, \ldots, X_{n - 1})$ of sets, and produces as its output a new set $G(X_0, \ldots, X_{n - 1})$. Then there should be a unique sequence $X_0, X_1, \ldots$ of sets such that

$X_n \cong G(X_0, \ldots, X_{n - 1})$

for all $n \in \mathbb{N}$. No base step is needed, as the case $n = 0$ gives $X_0 = G()$.

“Should” means that it seems intuitively appealing. It *doesn’t* follow
from ETCS. But it does follow from ETCS with replacement.

The two examples above, $(\beth_n)$ and $(B_n)$, have the special feature that for $n \geq 1$, the $n$th member of the sequence depends only on the $(n - 1)$th member — not on all the previous members.

A different kind of special case is where $X_n$ doesn’t depend on the sets $X_0, \ldots, X_{n - 1}$ themselves, but only on $n$. In that case, what $G$ does it to assign a set $X_n$ to each $n \in \mathbb{N}$. So transfinite recursion reduces in this case to replacement over the indexing set $\mathbb{N}$.

Now that we’ve warmed up with recursion over $\mathbb{N}$, let’s step up to an arbitrary well-ordered set $W$. Suppose we have a first-order formula that takes as its input a well-ordered set $V \prec W$ and a family $(X_v)_{v \in V}$ of sets, and produces as its output a set $G((X_v)_{v \in V})$ (determined up to isomorphism). Then there should be a family $(X_w)_{w \in W}$ of sets, unique up to isomorphism, such that

$X_w \cong G\bigl( (X_v)_{v \lt w} \bigr)$

for each $w \in W$.

That phrasing is a bit informal, but I hope it’s clear by now how you’d formalize it. Really $G$ is a first-order formula

$\Theta = \Theta(V, X, q, S)$

where the variables denote a well-ordered set $V \prec W$, a set $X$, a function $q$ from $X$ to the underlying set of $V$, and a set $S$. To understand the relationship with the previous paragraph, think of $q: X \to V$ as the family $(X_v)_{v \in V}$ and $S$ as $G((X_v)_{v \in V})$. We ask that for each $V$, $X$ and $q$, there is a unique-up-to-iso set $S$ such that $\Theta(V, X, q, S)$ holds.

Stated formally, the principle of transfinite recursion is about such formulas $\Theta$. But I’m not going to write out the formal statement of the theorem, because it would just look like a load of notation and you’d only want to convert it back into the informal version above, the one involving $G$.

Replacement implies the principle of transfinite recursion, by an inductive argument on $W$.

Conversely, the principle of transfinite recursion implies replacement. We already saw this in the case $I = \mathbb{N}$, and it’s true for a general indexing set $I$ by a similar argument (after choosing a well-ordering on $I$). Replacement is essentially the special case of transfinite recursion where $G((X_v)_{v \in V})$ depends only on $V$.

(I learned about the converse from a blog post by Joel David Hamkins, based on work of Benjamin Rin.)

So that’s what replacement *can* do. In summary: it gives us separation and
transfinite recursion, and in turn, transfinite recursion gives us the
existence of all the beths and of unboundedly many beth fixed points.

#### What *can’t* replacement do?

The most important thing it can’t do is prove the existence of inaccessibles:

It is consistent with ETCS + replacement that there are no inaccessible sets.

The proof follows a similar pattern to the proofs of all the similar statements in previous posts, so I’ll skip it.

In the converse direction, I’m pretty sure replacement isn’t implied (over ETCS) by the existence of an inaccessible set, or even by the much stronger condition that there are unboundedly many measurable sets. But I don’t know a proof. Can anyone clarify this point, or at least give a reference? Are there, for instance, theorems on determinacy that can be proved using replacement but aren’t guaranteed by the existence of unboundedly many measurable sets?

#### Is assuming replacement a good idea?

Sometimes we want to perform operations that are beyond what ETCS provides. At that point, it’s tempting to supplement ETCS with replacement. I don’t have a strong argument against doing so, but it’s a matter of fact that replacement is often way more than necessary.

For example, the first thing that people tend to want beyond ETCS is to be able to form, for a given set $X$, another set containing all of

$X, \ \mathcal{P}(X), \ \mathcal{P}\mathcal{P}(X), \ \ldots.$

(For instance, this lets us do an infinitely iterated dual vector space
construction.) As I mentioned in Part
6, such a
set exists if we assume the existence of all beths. This is a *far* weaker
assumption than replacement.

But occasionally, the existence of all beths might still not be enough. In such cases, it may be enough to assume the existence of a beth fixed point, or, stronger still, unboundedly many beth fixed points. All of these assumptions are still weaker than replacement.

In the hierarchy of assumptions that we’ve been going through in the posts,
the next one after “unboundedly many beth fixed points” is the existence of
an inaccessible set. Assuming the existence of an inaccessible is (if I’m
not mistaken) neither stronger nor weaker than replacement, and it’s
something that people often do — for example, in the context of
Grothendieck universes. Sometimes, one even wants to assume that
*unboundedly many* inaccessibles exist.

Now although such assumptions about inaccessibles are (I think)
incomparable with replacement — neither implies the other —
they seem to me to be somehow more specific, more limited, than
replacement. I can’t give a precise meaning to that, except to point out
again that replacement is an axiom *scheme*: it gives one axiom for each
suitable first-order formula $\Phi$, whereas the existence of inaccessibles
is a single axiom.

The lesson I’d draw is that if we’re looking at supplementing
ETCS with something like replacement, it’s at least worth *asking* whether
it might be better to use an axiom such as “all beths exist” or “there are unboundedly many inaccessibles” instead.

#### ETCS, ZFC, and the spectre of inconsistency

I’ve somehow managed to get this far without mentioning the equivalence between ETCS + replacement and ZFC. It’s not just that they have the same consistency strength (one is consistent if and only if the other is). They’re actually biiinterpretable, which is equivalence in the strongest sense. As I wrote in Part 1, abbreviating replacement to “R”:

[Biinterpretability] means that there’s a way of translating statements in the language of ETCS+R into statements in the language of ZFC, and vice versa, such that the two translation processes are mutually inverse and make theorems of ETCS+R match up with theorems of ZFC. In particular, ETCS+R is a categorical set theory exactly as strong as ZFC.

And this brings me to one reason why it might be wise not to assume replacement unless it’s truly needed: the spectre of inconsistency.

I find it barely conceivable that ETCS could be inconsistent. Its axioms are all principles that mathematicians — not just category theorists, but all mathematicians — use day in, day out. If it’s inconsistent, that means there’s some problem with constructing a sequence $(f^n(x))_{n \geq 0}$ from a function $f: X \to X$ and an element $x \in X$, or with taking the characteristic function $X \to \{0, 1\}$ of a subset of $X$, etc. The only axiom in ETCS that raises the slightest suspicion is choice, but that alone can’t cause inconsistency: it’s a theorem that if ETCS is inconsistent then so is ETCS without choice.

If ETCS is shown to be inconsistent then it will have truly cataclysmic implications. Vast amounts of mathematics — perhaps the majority of all published work, ever — will be invalidated.

What if ZFC is inconsistent? There was a lot of discussion of this question around the time that Vladimir Voevodsky was speaking on the topic (e.g. see this MathOverflow question), and obviously he’s not the only one to have considered it.

For myself, I’ll be pretty unflustered if, one day, ZFC is shown to be inconsistent, as long as ETCS remains intact. An equivalent statement: a proof of the inconsistency of ETCS+R wouldn’t bother me much if it used replacement in some essential way.

You might be thinking to yourself “ah, well, Tom’s just saying that because ETCS is the system he’s most familiar with”. But I don’t think this is just about personal feelings. ETCS alone provides enough foundation for a vast amount of mathematics. McLarty has argued that the entirety of EGA and SGA need nothing more. Assuming McLarty is correct, that is just an objective truth.

A proof of the inconsistency of ZFC (or equivalently, ETCS+R) that needed the full strength of replacement would certainly make a huge impact in the world of foundations, but I think its impact beyond would be limited. If it also proved the inconsistency of ETCS + “all beths exist”, or of ETCS plus some similarly mild additional axiom, that would cause upheaval across much more of mathematics. But still, probably many uses of principles beyond ETCS are out of convenience rather than true necessity. So the impact would be nothing compared to the global cataclysm that would be the inconsistency of ETCS.

#### Thanks, Todd!

Before I finish this series, I want to make sure to thank Todd Trimble. I’m
doing it now because Todd helped me understand some points about
replacement in conversations at Category Theory 2019 in Edinburgh. But my
gratitude extends much further back. Todd’s 2008 series of posts on
categorical set theory at the *Topological Musings* blog
(1,
2,
3)
had a big influence on me, both whetting my appetite to learn more
categorical set theory and showing me how it’s done.

(Todd’s posts are also archived at the nLab, starting here.)

Let me quote from the third of Todd’s posts, where he nicely describes the relationship between the elementary approach to categorical set theory and the topos-theoretic approach, as well as the influence of the late Myles Tierney:

Most textbook treatments of the formal development of topos theory (as for example Mac Lane–Moerdijk) are efficient but highly technical, involving for instance the slice theorem for toposes and, in the construction of colimits, recourse to Beck’s theorem in monad theory applied to the double power-set monad [following the elegant construction of Paré]. The very abstract nature of this style of argumentation (which in the application of Beck’s theorem expresses ideas of fourth-order set theory and higher) is no doubt partly responsible for the somewhat fearsome reputation of topos theory.

In these notes I take a much less efficient but much more elementary approach, based on an arrangement of ideas which I hope can be seen as “natural” from the point of view of naive set theory. I learned of this approach from Myles Tierney, who was my PhD supervisor, and who with Bill Lawvere co-founded elementary topos theory, but I am not aware of any place where the details of this approach have been written up before now.

This series of posts is intended to continue in that same Tierney–Trimble spirit.

#### Next time

The final post in this series will be a summary of everything, including a table of contents, diagrams, and (because I won’t be able to help myself) some reflections on the whole enterprise.

*Added later:* but first, Mike Shulman will talk about different versions of replacement.

## Re: Large Sets 12

I recently learned, far later than I should have, that Borel Determinacy (BD), the poster child for the use of Replacement in ZFC (there is a model of ZC where BD fails!), only requires Replacement where the domain is a countable ordinal. Sometimes people get a bit lazy and say it needs Replacement for $\omega_1$, but people (mostly Don Martin) have done really serious analysis of the proof, and (apart from some minor coding issues around forming ordinals irrelevant in the structural setting) pinned it down precisely. There are even more fine-grained results, saying how Replacement for specific countable ordinals gets this-and-that fragment of BD. Even more, it’s not countable Replacement as such, but only the existence of $\beth_\alpha$ for all countable ordinals $\alpha$! So asking that all beths exist is overkill even for the most famous theorem that is advertised as requiring Replacement.

It’s also curious that Fraenkel, after the work where he (apparently tentatively!) suggested Replacement as a strengthening of Zermelo’s axioms, also proposed an alternative axiom equivalent to countable Replacement. It was really von Neumann and his version of ordinals that tipped the scales towards arbitrary Replacement.