### Native Type Theory

#### Posted by John Baez

*guest post by Christian Williams*

Native Type Theory is a new paper by myself and Mike Stay. We propose a unifying method of reasoning for programming languages: model a language as a theory, form the category of presheaves, and use the internal language of the topos.

$\mathtt{language} \xrightarrow{\;\Lambda\;} \mathtt{category} \xrightarrow{\;\mathscr{P}\;} \mathtt{topos} \xrightarrow{\;\Phi\;} \mathtt{type\; system}$

Though these steps are known, the original aspect is simply the composite and its application to software. If implemented properly, we believe that native types can be very useful to the virtual world. Here, I want to share some of what we’ve learned so far.

## The Free Lunch

The biggest lesson of this paper was to have faith in what John Baez calls the *dao* of mathematics. For two years, Mike and I and Greg Meredith looked for ways to generate logics for programming languages; we tried many approaches, but ultimately the solution was the simplest.

Two facts of category theory provide an abundance of structure, for free:

$\text{1. Every category embeds into its presheaf topos.}$

$\text{2. Every topos has a rich internal language.}$

The embedding preserves limits and exponents, hence we can apply this to “higher-order theories”.

For now we explore the language of a topos. There are multiple views, and often its full power is not used. Toposes support geometric logic, predicate logic, and moreover *dependent type theory*. We emphasize the latter: dependent types are expressive and pervasive, yet underutilized in mathematics.

## The Language of a Topos

My thinking has been shaped by the idea that even foundations are categorical. Virtually any language can be modelled as a structured category: the most comprehensive reference I’ve found is Categorical Logic and Type Theory by Bart Jacobs.

Probably the most studied categorical structure of logic is the *topos*. Toposes of sheaves, functors which coherently assign data to a space, were first applied to algebraic geometry. A continuous map $f:X\to Y$ induces an inverse image $f:Sh(Y)\to Sh(X)$ which is a left adjoint that preserves finite limits. This gives *geometric morphisms* of toposes, and *geometric logic* ($\wedge$ and $\exists$) as the language of classifying toposes.

Though geometric logic is an important level of generality, the language of toposes is more powerful. In La Jolla, California 1965, the budding category theory community recognized Grothendieck’s categories of sheaves as being fundamentally logical structures, which generalize set theory. An *elementary topos* is a cartesian closed category with finite limits and a subobject classifier, an object which represents the correspondence of predicates and subobjects.

The language of an elementary topos T is encapsulated in a pair of structures.

$\text{The predicate fibration }\; \Omega\text{T}\to \text{T}\; \text{ reasons about predicates, like subsets;}$

$\text{The codomain fibration }\; \Delta\text{T}\to \text{T}\; \text{ reasons about dependent types, like indexed sets.}$

### Predicate Logic

Throughout mathematics, we use the internal predicate logic of Set. It is the canonical example of a topos: a predicate such as $\varphi(x)= (x+3\geq 5)$ is a function $\varphi:X\to 2=\{t,f\}$, which corresponds to its *comprehension*, the subobject of true terms $\{x\in X \;|\; \varphi(x)=t\}\subseteq X$.

Predicates on any set $X$ form a Boolean algebra $P(X)=[X,2]$, ordered by implication. Every function $f:X\to Y$ gives an *inverse image* $P(f):P(Y)\to P(X)$. This defines a functor $P:Set^{op}\to Bool$ which is a first-order hyperdoctrine: each $P(f)$ has adjoints $\exists_f\dashv P(f)\dashv \forall_f$ representing quantification, which satisfy the Beck-Chevalley condition.

Altogether, this structure formalizes classical higher-order predicate logic. Most formulae, such as

$\forall x,y:\mathbb{Q}.\; \exists z:\mathbb{Q}.\; x\lt z \wedge z\lt y$

$\forall f:Y^X.\; \forall y:Y.\; \exists x:X.\; f(x)=y \Rightarrow \exists g:X^Y.\; \forall y:Y.\; f(g(y))=y$

can be modelled in this logical structure of Set.

This idea is fairly well-known; people often talk about the “Mitchell-Benabou language” of a topos. However, this language is predicate logic over a *simple type theory*, meaning that the only type formation rules are products and functions. Many constructions in mathematics do not fit into this language, because types often depend on terms:

$\text{Nat}(n) := \{m:\mathbb{N} \;|\; m\leq n\}$

$\mathbb{Z}_p := \mathbb{Z}/\langle x\sim y \equiv \exists z:\mathbb{Z}.\; (x-y)=z\cdot p\rangle$

This is provided by extending predicate logic with dependent types, described in the next section.

So, we have briefly discussed how the structure of Set allows for the the explicit construction of predicates used in everyday mathematics. The significance is that these can be constructed *in any topos*: we thereby generalize the historical conception of logic.

For example, in a presheaf topos $[C^{op},\text{Set}]$, the law of excluded middle does not hold, and for good reason. Negation of a sieve is necessarily more subtle than negation of subsets, because we must exclude the possibility that a morphism is not in but “precomposes in” to a sieve. This will be explored more in the applications post.

### Dependent Type Theory

Dependency is pervasive throughout mathematics. What is a monoid? It’s a set $M$, and *then* operations $m,e$ on $M$, and *then* conditions on $m,e$. Most objects are constructed in *layers*, each of which depends on the ones before. Type theory is often regarded as “fancy” and only necessary in complex situations, similar to misperceptions of category theory; yet dependent types are everywhere.

The basic idea is to represent dependency using *preimage*. A type which depends on terms of another type, $x:A\vdash B(x):Type$, can be understood as an indexed set $\{B(x)\}_{x:A}$, and this in turn is represented as a function $\coprod x:A.\; B(x)\to A$. Hence the “world of types which depend on $A$” is the *slice category* Set$/A$.

The poset of “$A$-predicates” sits inside the category of “$A$-types”: a comprehension is an injection $\{x\in A \;|\; \varphi(x)\}\to A$. This is a degenerate kind of dependent type, where preimages are *truth values* rather than sets. So we are expanding into a larger environment, which shares all of the same structure. The slice category Set$/A$ is a categorification of $P(A)$: its morphisms are commuting triangles, understood as $A$-indexed functions.

Every function $f:A\to B$ gives a functor $f^\ast:$Set$/B\to$Set$/A$ by pullback; this generalizes preimage, and can be expressed as substitution: given $p:S\to B$, we can form the $A$-type

$x:A\vdash f^\ast(S)(x) = S(f(x)):\text{Type}.$

This functor has adjoints $\Sigma_f\dashv f^\ast\dashv \Pi_f$, called **dependent sum** and **dependent product**: these are the powerful tools for constructing dependent types. They generalize not only quantification, but also product and hom: the triple adjunction induces an adjoint co/monad on Set$/B$

$\Sigma_f\circ f^\ast = -\times f \dashv [f,-] = \Pi_f\circ f^\ast.$

These *dependent* generalizations of product and function types are extremely useful.

Indexed sum generalizes product type by allowing the type of the second coordinate to depend on the term in the first coordinate. For example: $\Sigma n:\mathbb{N}.\text{List}(n)$ consists of dependent pairs $\langle n, l:\text{List}(n)\rangle$.

Indexed product generalizes function type by allowing the type of the codomain to depend on the term in the domain. For example: $\Pi S:\text{Set}.List(S)$ consists of dependent functions $\lambda S:\text{Set}.\varphi(S):List(S)$.

See how natural they are? We use them all the time, often without realizing. Simply by using preimage for indexing, we generalize product and function types to “branching” versions, allowing us to build up complex objects such as

$\text{Monoid}:= \Sigma M:\text{Set}.\Sigma m:M^2\to M.\Sigma e:1\to M...$

$...\Pi (a,b,c):M^3. m(m(a,b),c)=m(a,m(b,c)). \Pi a:M. m(e,a)=a=m(a,e).$

This rich language is available in *any* topos. I think we’ve hardly begun to see its utility in mathematics, computation, and everyday life.

### All Together

A topos has two systems, predicate logic and dependent type theory. Each is modelled by a *fibration*, a functor into the topos for which the preimage of $A$ are the $A$-predicates/$A$-types. A morphism in the domain is a judgement of the form “in the context of variables of these (dependent) types, this term is of this (dependent) type”.

$a:A,b:B(a),\dots, z:Z(y)\vdash t:T$

The two fibrations are connected by comprehension which converts a predicate to a type, and image factorization which converts a type to a predicate. These give that the predicate fibration is a reflective subcategory of the type fibration.

Altogether, this forms the *full higher-order dependent type theory* of a topos. As far as I know, this is what deserves to be called “the” language of a topos, in its full glory. This type theory is used in proof assistants such as Coq and Agda; eventually, this expressive logic + dependent types will be utilized in many programming languages.

## Conclusion

Native Type Theory gives provides a shared framework of reasoning for a broad class of languages. We believe that this could have a significant impact on software and system design, if integrated into existing systems.

In the next post, we’ll explore why this language is so useful for the topos of *presheaves on theories*. Please let me know any thoughts or questions about this post and especially the paper. Thank you.

## Re: Native Type Theory

Hey pretty neat.

I initially looked into topos stuff too when trying to figure out a pointful syntax for a categorical programming language but the math was way too beyond me.

The main thing I found that was helpful was

http://www.kurims.kyoto-u.ac.jp/~hassei/papers/ctcs95.html

Decomposing typed lambda calculus into a couple of categorical programming languages

And using a tagless final style. The connection to reflexive objects is also neat!

In pseudo Coq

Class KappaZeta k `(Category k) := lam : (k 1 a -> k b c) -> k b (a -> c) app : k a (b -> c) -> k 1 b -> k a c unit : k a 1

kappa : (k 1 a -> k b c) -> k (a * b) c lift : k (a * b) c -> k 1 a -> k b c

I always knew there was some abstract nonsense about parametric HOAS, indexing and topos stuff that kind of related but I lacked the background to understand.