Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

August 30, 2021

You Could Have Invented De Bruijn Indices

Posted by Mike Shulman

One of the most curious things about formal treatments of type theory is the attention paid to dummy variables. The point is simple enough to explain to a calculus student: just as the integrals 0 2x 3dx\int_0^2 x^3 \, dx and 0 2t 3dt\int_0^2 t^3 \, dt are equal, so are the functions λx.x 3\lambda x. x^3 and λt.t 3\lambda t. t^3 — the name of the variable doesn’t matter. (In fact, the former statement factors through the latter.) But managing this equality (which goes by the highbrow name of “α\alpha-equivalence”) turns out to be quite fiddly when actually implementing a typechecker, and people have developed all sorts of different-sounding ways to deal with it, with names like De Bruijn indices, De Bruijn levels, locally named terms, locally nameless terms, and so on.

In this post I want to explain what’s going on from a categorical and/or univalent perspective, and why some of these fancy-sounding things are really quite simple. To cut to the punchline, at bottom what’s going on is just different ways to choose a concrete representative of an object that’s determined only up to unique isomorphism.

Let’s consider possibly the simplest case: terms of the lambda-calculus. These are generated inductively by three operations:

  • Variables are terms
  • If MM and NN are terms, so is the application MNM N.
  • If MM is a term and xx a variable, then the abstraction λx.M\lambda x. M is a term.

A usage of a variable xx inside a λx\lambda x is called bound, while other usages of xx are called free. Thus in λx.yx\lambda x. y x the usage of xx is bound but the usage of yy is free. Semantically, the free variables must be given meaning in a context such as y:A,z:By:A, z:B, while the bound variables are “dummies” that don’t appear in the context.

I just described the traditional approach, where one first defines all the terms at once, then defines what the free variables of a term are. But at the end of the day, what we really want to have is, for each context of (perhaps typed) variables, the set of terms containing those variables free. Or more precisely, containing at most those variables free, since for instance λx.yx\lambda x. y x makes sense in the context y:A,z:By:A, z:B even though it doesn’t actually use the variable zz.

Thus, from a typed perspective, it would be better to define the sets of terms indexed by the set of variables in their context. These are called well-scoped terms. Since we are ordinary mathematicians who think structurally, of course it doesn’t matter what these variables “actually are”, so we can just suppose them to be drawn from an abstract set VV. Now we intend to give an inductive definition of the set Tm VTm_V of “terms with free variables from VV”, with VV an arbitrary set, and the first two clauses are obvious:

  • For any set VV, if xVx\in V, then v xTm Vv_x \in Tm_V.
  • For any set VV, if MTm VM\in Tm_V and NTm VN\in Tm_V, then MNTm VM N \in Tm_V.

The third clause is less clear, since in an abstraction λx.M\lambda x.M the input term MM must have xx as a free variable, while xx is no longer free in λx.M\lambda x.M. In other words, MM must have one more variable in its context than λx.M\lambda x.M, so it’s natural to say:

  • For any set VV, if MTm V+1M\in Tm_{V+1}, then λMTm V\lambda M \in Tm_V.

Here V+1V+1 denotes the coproduct, or disjoint union, of VV with a one-element set, denoted 11. Note that when phrased like this, we don’t need to annotate the λ\lambda with the variable name xx; we know what the bound variable is, it’s just the extra element of V+1V+1 that comes from 11.

The point is that, categorically speaking, V+1V+1 doesn’t denote a specific set, but only a set determined up to unique isomorphism; and thus we could potentially make different choices about exactly what set to use. But of course, from a structural point of view, the choice doesn’t matter. So to start with, let’s proceed purely structurally. This means that it doesn’t make sense to ask whether the elements of V+1V+1 are “the same as” any of the elements of VV, or the element of 11. Instead, we have two functions inl:VV+1inl : V\to V+1 and inr:1V+1inr : 1\to V+1, and any element of V+1V+1 can be written uniquely as either inl(x)inl(x) for some xVx\in V or as inr()inr(\star) for \star the element of 11.

To see what sort of terms we end up with, consider for example the ordinary term “λx.λy.x\lambda x. \lambda y. x”. This is a closed term with no free variables, so it must live in Tm Tm_\emptyset. Its outer operation is an abstraction, so it must be obtained from the constructor λ\lambda applied to some term in Tm +1Tm_{\emptyset+1} that corresponds to “λy.x\lambda y.x”. This latter term is also an abstraction, so it must also be obtained from λ\lambda applied to some term in Tm (+1)+1Tm_{(\emptyset+1)+1}. This last term must be a variable, i.e. an element of (+1)+1(\emptyset+1)+1, and there are two such elements: inr()inr(\star) and inl(inr())inl(inr(\star)). Since inr()inr(\star) comes from the rightmost copy of 11, it corresponds to the most recently bound variable (which we were originally writing “yy”); while inl(inr())inl(inr(\star)) comes from the other copy of 11, hence corresponds to the next most recently bound variable (our “xx”).

Thus, our term λx.λy.x\lambda x. \lambda y. x would be written λλv inl(inr())\lambda \lambda \, v_{inl(inr(\star))}, while the similar but distinct term λx.λy.y\lambda x. \lambda y. y would be written λλv inr()\lambda \lambda \, v_{inr(\star)}. The examples at the next level of binding should make the pattern clear:

  • λx.λy.λz.x\lambda x. \lambda y. \lambda z. x is λλλv inl(inl(inr()))\lambda\lambda\lambda\, v_{inl(inl(inr(\star)))}.
  • λx.λy.λz.y\lambda x. \lambda y. \lambda z. y is λλλv inl(inr())\lambda\lambda\lambda\, v_{inl(inr(\star))}.
  • λx.λy.λz.z\lambda x. \lambda y. \lambda z. z is λλλv inr()\lambda\lambda\lambda\, v_{inr(\star)}.

In general, whenever we need to use a bound variable, we “work out” along the structure of the term counting the number of λ\lambdas that we pass until we get to the one where our desired variable was bound. If we passed nn such λ\lambdas (not counting the one where our variable was bound), then the representation of our variable is v inl n(inr())v_{inl^{n}(inr(\star))}.

The number nn is called the De Bruijn index of our variable. Thus, we see that De Bruijn indices are really the most natural representation of bound variables from a structural point of view. Usually people just write “nn” instead of v inl n(inr())v_{inl^{n}(inr(\star))}, so our first two examples would be λλ1\lambda\lambda 1 and λλ0\lambda\lambda 0. As for the free variables, when referring to an element xVx\in V inside a term in Tm VTm_V, we have to represent it by inl n(x)inl^n(x) where nn is the number of λ\lambdas inside which the reference occurs; thus λy.λz.x\lambda y.\lambda z. x would be λλv inl(inl(x))\lambda\lambda\, v_{inl(inl(x))}.

Other representations of variable binding arise by working non structurally, allowing ourselves to make more specific choices of “V+1V+1”. Specifically, suppose we fix a particular infinite ambient set WW, a subset CPWC \subseteq P W of its powerset containing at least one element of each finite cardinality, and an operation suc:CCsuc : C\to C together with a family of functions inl:Vsuc(V)inl: V\to suc(V) exhibiting each suc(V)suc(V) as a coproduct of VV with a one-element set. Then we can similarly define the family of sets of terms Tm VTm_V, for all VCV\in C, by analogous clauses:

  • For any VCV\in C, if xVx\in V, then v xTm Vv_x \in Tm_V.
  • For any VCV\in C, if MTm VM\in Tm_V and NTm VN\in Tm_V, then MNTm VM N \in Tm_V.
  • For any VCV\in C, if MTm suc(V)M\in Tm_{suc(V)}, then λMTm V\lambda M \in Tm_V.

Some popular choies of W,C,suc,inlW, C, suc, inl include the following.

  1. Let W=W=\mathbb{N}, with CC consisting of the initial segments [n]={0,1,2,,n1}[n] = \{0,1,2,\dots, n-1\}, and suc([n])=[n+1]suc([n]) = [n+1] and inlinl the obvious inclusion, inl(k)=kinl(k)=k. This representation is called De Bruijn levels: in a term without free variables, each variable is identified by the number of λ\lambdas we have to cross to get to its binder, counting in from the outside of the term rather than out from the variable usage. Thus, for instance, λx.λy.x\lambda x. \lambda y. x is λλv 0\lambda\lambda v_0 while λx.λy.y\lambda x. \lambda y. y is λλv 1\lambda\lambda v_1. When there are free variables, they are given an ordering and identified by their position in that ordering, with bound variables numbered after the free ones. Note that unlike De Bruijn indices, with De Bruijn levels each variable has the same representation regardless of where it is used in the term.

  2. Let W,C,sucW,C,suc be as before, but now define inl:[n][n+1]inl : [n] \to [n+1] by inl(k)=k+1inl(k) = k+1. This yields De Bruijn indices again, with the free variables ordered and numbered counting out from the term, i.e. coming after all the bound ones.

  3. Fix an infinite set FF that is disjoint from \mathbb{N}, let W=FW=F \cup \mathbb{N}, and let CC consist of all finite subsets of WW such that VV\cap \mathbb{N} is an initial segment. If V=V F[n]V = V_F \cup [n], with [n]=V[n] = V\cap \mathbb{N}, then define suc(V)=V F[n+1]suc(V) = V_F \cup [n+1], with inlinl acting as the identity on V FV_F and as inl(k)=k+1inl(k)=k+1 on [n][n]. If we consider only the terms whose free variables are subset of FF, this yields a representation known as locally nameless, where bound variables are given De Bruijn indices but free ones are named by an element of some arbitrary set FF. One popular choice is to take F=F=\mathbb{N} as well, with V FV_F required to be an initial segment too; then the free variables are represented as De Bruijn levels.

All of these representations are isomorphic to each other, simply because their only difference is that they make different choices about how to “materially” represent the structural operation V+1V+1. (More precisely, given a bijection between a set VCV\in C in one representation and a set VCV'\in C' in another representation, there is an induced bijection Tm VTm VTm_V \cong Tm'_{V'}.) In particular, none of these representations need a separate notion of “α\alpha-equivalence” to identify terms by “renaming bound variables”, because they all have a unique way to name bound variables.

But what about ordinary “named variables”, where we do have a choice about what bound variable names to use? One way to obtain these terms in a well-scoped way is with the following inductive definition:

  • For any set VV, if xVx\in V, then v xTm Vv_x \in Tm_V.
  • For any set VV, if MTm VM\in Tm_V and NTm VN\in Tm_V, then MNTm VM N \in Tm_V.
  • For any set VV and element xVx\notin V, if MTm V{x}M\in Tm_{V\cup \{x\}}, then λx.MTm V\lambda x. M \in Tm_V.

Note that since there is a choice of xx in the third clause, we now have to notate the variable xx in the term λx.M\lambda x.M, as expected. This choice also means that now we are no longer defining the same family of sets, but will have to quotient by α\alpha-equivalence to obtain something isomorphic. (We could also achieve that from the start by making this a higher inductive definition, with a fourth constructor saying that λx.M=λy.M\lambda x.M = \lambda y.M whenever x,yVx,y\notin V.)

The requirement that xVx\notin V forbids “variable shadowing”. That is, if xx is already in scope at the point of a λ\lambda-abstraction, either as a free variable or as a variable bound by a containing λ\lambda, we are not allowed to re-bind it again: we can’t write λx.λx.x\lambda x. \lambda x. x. I believe this is roughly what’s known as the Barendregt convention for named variables.

Another approach to named variables instead allows re-use of names, where inner binders “shadow” outer ones; thus λx.λx.x\lambda x. \lambda x. x would be α\alpha-equivalent to λx.λy.y\lambda x. \lambda y. y, whereas inside the inner MM of a term λx.λx.M\lambda x. \lambda x. M there is no way to refer to the outer xx any more. We could achieve something like this by simply dropping the requirement that xVx\notin V:

  • For any set VV and element xx, if MTm V{x}M\in Tm_{V\cup \{x\}}, then λMTm V\lambda M \in Tm_V.

Then if it happens that xVx\in V already, we have V{x}=VV \cup \{x\} = V, so the inner term MM has no more variable names to refer to than it used to. But the fact that the cardinality of the context sometimes doesn’t go up by one when we pass across a binder is kind of icky. I haven’t seen anyone really try to work with well-scoped terms like this, but it seems likely to cause all sorts of problems, in particular in defining and quotienting by α\alpha-equivalence to obtain something isomorphic to the simpler representations.

Another approach to variable shadowing would be to mix names with De Bruijn indices: given a set FF of variable names, let W=F×W = F \times \mathbb{N} and CC the set of finite subsets of WW whose fiber over each point of FF is an initial segment of \mathbb{N}. Then given VCV\in C and xFx\in F, if the fiber of VV over xx is {x}×[n]\{x\} \times [n], let V+xV + x be V(x,n)V \cup (x,n), and define inl:VV+xinl : V \to V+x by inl(x,k)=(x,k+1)inl(x,k) = (x,k+1) and the identity on all other fibers. Now the third clause becomes

  • For any VCV\in C and xFx\in F, if MTm V+xM\in Tm_{V+x}, then λx.MTm V\lambda x. M \in Tm_V.

Now we still only need to decorate the binder λ\lambda with the element xx of FF (rather than an element of W=F×W = F\times \mathbb{N}), and if we identify xFx\in F with (x,0)W(x,0)\in W we can use named variables as usual, with shadowing: the term λx.λx.x\lambda x. \lambda x. x means λx.λx.v (x,0)\lambda x. \lambda x. v_{(x,0)}, where v (x,0)v_{(x,0)} refers to the innermost λx\lambda x. But shadowed variables are still available by incrementing the index: in λx.λx.v (x,1)\lambda x. \lambda x. v_{(x,1)} the variable refers to the outer binder.

I doubt anyone would use this representation in practice: from a formal point of view, it seems likely to have all the drawbacks of both named variables and De Bruijn indices. However, I think it’s a nice way to understand what “named variables” are actually doing, and how they relate to things like De Bruijn indices, De Bruijn levels, locally nameless variables, and so on. (This discussion also suggests that in principle, it should be possible to abstract over a “method of handling variables”; a few years ago I wrote down some experiments in this direction here, but never pursued it further.)

Anyway, the point I wanted to make is that all these weird-looking things are “really just” different representations of an invariant structural operation. This doesn’t make them any less important as far as implementation goes: when actually coding things in a computer, we often do need to choose particular representations. But maybe it makes them a bit less scary to a category theorist.

Posted at August 30, 2021 9:23 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3346

23 Comments & 0 Trackbacks

Re: You Could Have Invented De Bruijn Indices

What does the notation v_x mean? Is it just notation for “the variable called x”? So in the “traditional” formulas it would just be written x?

Posted by: Noah Snyder on August 30, 2021 11:10 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Yes, that’s right. Just like the elements of a coproduct X+YX+Y can’t, structurally, be identified literally with elements of XX or YY but have tags inlinl and inrinr on them, the elements of an inductively defined type like Tm VTm_V all have tags on them, like λ\lambda and vv. Application MNM N should really have a tag too, like app(M,N)app(M,N); I’m not sure why I felt the need to tag the variables explicitly but not application! Sorry about that.

Posted by: Mike Shulman on August 30, 2021 11:30 PM | Permalink | Reply to this

shadowing

You mentioned not seeing anyone allow shadowed variables, but Coq allows fun x : nat => fun x : nat => x. You were of course talking about the internal representation, so I’ll mention that Python also allows it, as lambda x: lambda x: x, and that’s probably how it’s stored as well.

PS: You wrote λλv inl(inr(*))\lambda\lambda v_{inl(inr(\ast))} at one spot where you meant λλv inr(*)\lambda \lambda v_{inr(\ast)}.

Posted by: Dan Christensen on August 31, 2021 1:31 AM | Permalink | Reply to this

Re: shadowing

Ah, sorry if I wasn’t clear. Of course lots of languages allow shadowed variables; that’s why I brought up the possibility in the first place. In fact, I don’t know of any languages that don’t allow you to shadow variables, at least in their “surface” syntax that uses named variables. (I don’t know of any serious languages that use anything other than named variables in surface syntax, but many of them use something like De Bruijn indices in their internal representations.) And the traditional approach to formal syntax with named variables also allows variable shadowing: λx.λx.x\lambda x. \lambda x. x is a perfectly good term, and we compute its free variables as

fv(λx.λx.x)=fv(λx.x){x}=(fv(x){x}){x}=({x}{x}){x}={x}=. fv(\lambda x. \lambda x. x) = fv(\lambda x.x) \setminus \{ x \} = (fv(x) \setminus \{x\})\setminus \{x\} = (\{x\} \setminus \{x\})\setminus \{x\} = \emptyset \setminus \{x\} = \emptyset.

All I meant was that I haven’t seen anyone doing rigorous mathematics with intrinsically well-scoped terms using named variables that allow variable shadowing. I suppose maybe that’s so many qualifiers that the statement is essentially vacuous. I’m not sure offhand whether I’ve seen intrinsically well-scoped terms done with named variables at all.

Posted by: Mike Shulman on August 31, 2021 1:44 AM | Permalink | Reply to this

Re: shadowing

And I fixed the typo, thanks!

Posted by: Mike Shulman on August 31, 2021 1:45 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

I humbly suggest an alternative, first previewed in the 80s by Curien, Hagino, and others: Let’s remove lambda-abstractions and names altogether, and use combinators instead. Which combinators? We’ll choose the categorical combinators, which implement universal properties in arbitrary (free) categories.

Concretely, I recently implemented an esoteric programming language, Cammy. Expressions in Cammy do not bind names. For example, the following expression, which might well come directly from Hagino’s thesis on categorical programming, implements addition:

(uncurry (pr (curry snd) (curry (comp app succ))))

While this is nigh-unreadable by our typical standards, it can be fed directly into an optimizer which uses equational reasoning, with no extra nominal logic or scope-mangling. For a simpler example, the following expression implements the test for whether a natural number is even:

(pr t not)

The arrow t:1Ω\mathtt{t} \colon 1 \to \Omega selects truth, the arrow not:ΩΩ\mathtt{not} \colon \Omega \to \Omega flips between truth and falsity, and the functor pr:(1X)(XX)X\mathtt{pr} \colon (1 \to X) \to (X \to X) \to \mathbb{N} \to X builds an arrow (prtnot):Ω\mathtt{(pr\;t\;not)} \colon \mathbb{N} \to \Omega with primitive recursion.

For more details, as well as citations to related work, please see this appendix.

Posted by: Corbin on August 31, 2021 3:21 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Sorry, is the optimizer supposed to deal somehow with the nigh-unreadability?

Posted by: Mike Shulman on August 31, 2021 3:34 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Humans can learn to read any language; languages are memetic. There are even less-readable tacit languages which are used in practice, like K. Additionally, entire subcultures of mainstream ecosystems practice pointfree techniques, like pointfree Haskell.

To make the point crisper, at the risk of only being more confusing, consider the following Scheme expression:

(define (add x y) (if (eq? x 0) y (add (- x 1) (+ y 1))))

This is longer than the respective Cammy snippet, but equivalent in intent and behavior (when restricted to the natural numbers). A Scheme-reader would consider this to be very clear (very slow and buggy) Scheme, but I don’t know whether an English-reader would consider either to be readable; readability might be relative to each reader’s personal experiences and practiced abilities.

Also, for Cammy specifically, I figured that someday I would introduce some sort of M-expression or Shutt-style F-expression. I don’t know what they would look like, though. It happens to be the case, historically, that when a low-level machine language is at all readable and writeable, there will be some folks who prefer the machine language to any sort of higher-level intent-oriented language; hopefully both Cammy and any languages which desugar to it will gracefully accept this fate.

Posted by: Corbin on August 31, 2021 4:34 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

I’ll grant that humans can learn to read almost any language – even De Bruijn indices! – but that doesn’t change the fact that some languages are significantly easier than others, both to learn and to use.

Posted by: Mike Shulman on August 31, 2021 11:44 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

The history page for the CAML language elaborates the origins of categorical combinators and their use as a low-level language/intermediate form for a compiler: https://caml.inria.fr/about/history.en.html

Posted by: Ryan Wisnesky on September 1, 2021 2:12 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Rather vague question, but by thinking structurally about variables like this, is there a way to rethink the “mismatch” between type theory and category theory, as here?

In category theory, however, pullback is not generally strictly associative, so there is a mismatch. Similarly, every type-forming operation must also be strictly preserved by substitition/pullback.

The way this is generally dealt with is to introduce a category-theoretic structure which does have a “substitution” operation which is strictly associative, hence does correspond directly to type theory, and then show that any category can be “strictified” into such a stricter structure. Unfortunately, there are many different such structures which have been used, differing only slightly.

Can’t the type theory be de-strictified? Or is it that concrete choices must be made, as you write

when actually coding things in a computer, we often do need to choose particular representations.

Posted by: David Corfield on August 31, 2021 8:07 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Various people have considered de-strictifying the type theory. It’s almost certainly possible, but most of us think it would have a severe impact on practical usability. It’s possible it would provide an alternative approach to the coherence problem, doing the translation from strict to non-strict in the world of syntax rather than making non-strict semantic things into strict ones. But I don’t know of it being seriously pursued.

I don’t think that is related to the issues of variable binding, however.

Posted by: Mike Shulman on August 31, 2021 2:18 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Ok, thanks. That makes sense.

Posted by: David Corfield on August 31, 2021 5:27 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

I doubt anyone would use this representation in practice…

This is the approach taken in Dhall. See: https://www.haskellforall.com/2021/08/namespaced-de-bruijn-indices.html

Posted by: Isaac Elliott on August 31, 2021 12:45 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Wow! What a good thing I wrote this blog post, or maybe I never would have heard of Dhall. It’s amusing that I suggested that “namespaced De Bruijn indices” (a good name) would have the drawbacks of both De Bruijn indices and named variables, while that blog post claims they have the best of both worlds! On further consideration, I think the truth is in between: they have some of the advantages of each and some of the disadvantages of each.

First the advantages:

  • Like named variables, namespaced De Bruijn indices lead to fairly readable terms.

  • Like De Bruijn indices, namespaced De Bruijn indices avoid any possibility of “variable capture” without the need to do α\alpha-renaming during substitution.

Now the disadvantages:

  • Like De Bruijn indices, namespaced De Bruijn indices require explicit weakening of terms when substituting them under binders. This probably won’t get used much in practice, since people tend to use different variable names for most things; but the code for it still has to be written, and if it rarely gets used then the code is more likely to be buggy.

  • Like named variables, namespaced De Bruijn indices require explicit code to test α\alpha-equivalence, rather than simply using syntactic identity the way De Bruijn indices do. (The blog post suggests testing α\alpha-equivalence by simply converting to ordinary De Bruijn indices — it seems to me if you’re doing that, you might as well just use the ordinary De Bruijn indices internally all the way through and avoid having to do the conversion at each equality check.)

Posted by: Mike Shulman on September 1, 2021 1:27 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

One reason that makes working with “named variables” in formalized contexts (i.e. where the meta-theory is forcibly constructive) is the 3rd rule:

  • For any set VV and element xVx\notin V, if MTm V{x}M\in Tm_{V\cup\{x\}}, then λx.MTm V\lambda x.M\in Tm_{V}.

The problem is x¬Vx\not\in V. What will be the evidence/witness to this?

This doesn’t seem like too big an issue, until one tries to implement substitution properly. The x¬Vx\not\in V can’t be forgotten because otherwise we could easily end up with name capture. There’s a reason that proving that substitution is associative is a lot of work in pretty much all representations.

When dealing with bound variables, one indeed has a whole slew of choices. However, when dealing with presentations of theories instead, many of the same choices become unacceptable. For one, the main consumer of a presentation is a human and, for humans, names matter. A lot. We use names all the time to communicate intent. Using ++ in a context where your operation is neither associative nor commutative will largely confuse your readers, so that your writing will fail at its core job: communication.

Nevertheless, the category of presentations does validate alpah\alpah-equivalence. It doesn’t matter what you call the binary operation of a Magma, they will all be equivalent. They are just not equal.

In other words, theory presentations are both ‘intensional’ and ‘intentional’, but their semantics can indeed be extensional. Thus one needs to use explicit transport (i.e. renamings) when leveraging theorems proved in one presentation to be theorems in another. Usability then pushes implementors to try to automate this as much as possible.

Still think this is trivial? Notice how

  • Lean’s mathlib has both has_add and has_mul, which leads to having both ‘additive semigroup’ and ‘semigroup’ !

  • Isabelle’s standard typeclass hierarchy has the same

  • The version of Monoid defined in mathcomp does the same thing.

and so on.

I consider this to be quite a horrifying hack.

Posted by: Jacques Carette on September 1, 2021 2:56 PM | Permalink | Reply to this

Nominal sets

Nice post! Are you familiar with Gabbay and Pitts’ nominal approach to binding? I think it allows shadowing the way you described: if you have a λ\lambda-term tt with free variables in V{x}V\cup\{x\}, then λx.t\lambda x.t has free variables in VV (actually, V{x}V \setminus \{x\}, to be tighter).

Nominal sets are sets endowed with an action of the group of permutations of \mathbb{N}. Each number represents a variable name, and the group action renames variables. We require that each element xx of a nominal set have a finite support VV: if a permutation fixes all elements of VV, then its action fixes xx as well. (They form a category that is equivalent to the Schanuel topos, if that sounds more familiar.)

We can define the syntax of the λ\lambda-calculus modulo α\alpha-equivalence directly as a nominal set, by building the initial algebra of a certain functor. The definition is well-scoped, but only implicitly: you never describe the scope (= the finite support) of elements of a nominal set directly. However, you can compute what it means for a λ\lambda-abstraction to have free variables in some set VV, and you get exactly the description above.

Posted by: Arthur Azevedo de Amorim on September 11, 2021 12:09 AM | Permalink | Reply to this

Re: Nominal sets

I’ve been hearing about nominal sets for some time, but I’m still not really sold on them. I’m not sure exactly why. I’m happy with other topos-based definitions of syntax, e.g. in presheaves on renamings. Maybe I would be happier with it if I sat down and unwound exactly what it all means in terms of the Schanuel topos (which I’m not very familiar with either yet).

Posted by: Mike Shulman on September 11, 2021 11:43 PM | Permalink | Reply to this

Re: Nominal sets

I think one reason I have trouble with nominal sets is that it doesn’t seem that a definition of syntax using nominal sets is one that I can go code up in my favorite programming language. Right? My understanding is that it only makes sense inside the Schanuel topos, and requires a programming language augmented with “nominal” features to make use of it.

Posted by: Mike Shulman on September 12, 2021 5:58 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

You are right; to use nominal binding, it is much better to have a tool that internalizes the theory. Unfortunately, such tools are under developed, but there are some options, implemented or on paper. The most well known one is the nominal package for Isabelle (https://isabelle.in.tum.de/nominal/main.html), but its development seems to have stalled around 2012. Ulrich Schöpp’s thesis “Names and Binding in Type Theory” (https://era.ed.ac.uk/handle/1842/1203) has a nominal dependent type theory, but only exists on paper, as far as I can tell.

In such a tool, it would be possible to define the syntax of the λ\lambda-calculus as follows, in pseudo-Coq syntax:

Inductive term := | Var (n : name) | App (t1 t2 : term) | Abs (t : abs term).

Here, name is the type of variables, and abs T is the type of elements of the form <n> t, where n : name, t : T, and elements are treated up to α\alpha-equivalence.

Posted by: Arthur Azevedo de Amorim on September 22, 2021 4:08 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

You are right; to use nominal binding, it is much better to have a tool that internalizes the theory. Unfortunately, such tools are under developed, but there are some options, implemented or on paper. The most well known one is the nominal package for Isabelle (https://isabelle.in.tum.de/nominal/main.html), but its development seems to have stalled around 2012. Ulrich Schöpp’s thesis “Names and Binding in Type Theory” (https://era.ed.ac.uk/handle/1842/1203) has a nominal dependent type theory, but only exists on paper, as far as I can tell.

In such a tool, it would be possible to define the syntax of the λ\lambda-calculus as follows, in pseudo-Coq syntax:

Inductive term := | Var (n : name) | App (t1 t2 : term) | Abs (t : abs term).

Here, name is the type of variables, and abs T is the type of elements of the form <n> t, where n : name, t : T, and elements are treated up to α\alpha-equivalence.

Posted by: Arthur Azevedo de Amorim on September 22, 2021 4:09 PM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

Could there be a mistake on the De Bruijn index of free variables?

In the example (+1)+1(\emptyset+1)+1 you would need to replace \emptyset for a bigger context VV to allow for free variables.

Now you’d have (V+1)+1(V+1)+1. Therefore the term λyλz.x\lambda y \lambda z.\ x as a De Bruijn index would look like λλv inl(inl(x))\lambda \lambda v_{inl(inl(x))}.

Right? (You wrote λλv inr(inr(x))\lambda \lambda v_{inr(inr(x))} instead which doesn’t make sense.)

Loved the post btw, really helped me understand this. Cheers!

Posted by: Julian on September 27, 2023 2:08 AM | Permalink | Reply to this

Re: You Could Have Invented De Bruijn Indices

You’re right! I’ll fix it in the post.

Posted by: Mike Shulman on September 27, 2023 2:48 AM | Permalink | Reply to this

Post a New Comment