### What is a Formal Proof?

#### Posted by Mike Shulman

There’s been some discussion recently in the homotopy type theory community about questions like “must type-checking always be decidable?” While the specific phrasing of this question is specific to type theory (and somewhat technical as well), it is really a manifestation of a deeper and more general question: what is a formal proof?

At one level, the answer to this question is a matter of definition: any particular foundational system for mathematics *defines* what it considers to be a “formal proof”. However, the current discussions are motivated by questions in the *design* of foundational systems, so this is not the relevant answer. Instead the question is what properties should a notion of “formal proof” satisfy for it to be worthy of the name?

To start with let me emphasize that whenever I say a “proof” I will mean a *correct* proof. In addition to defining a notion of (correct) formal proof, a foundational system often defines some class of “arguments” that may or may not be correct; but for now, let’s just consider the correct proofs.

I’m inclined to claim that the real *sine qua non* of a notion of “formal proof” is a **soundness theorem**. That means that if we can prove $C$ (in some formal system) under hypotheses $A$ and $B$, say, and we know (externally to the formal system) that $A$ and $B$ are true, we ought to be able to conclude (again externally) that $C$ is also true. For if a proof doesn’t guarantee the truth of its conclusion, what good is it to prove anything?

To be sure, categorical logicians want more than a simple soundness theorem that refers to “truth”: we want to be able to interpret proofs in arbitrary sufficiently structured categories. More precisely, proofs should provide a means of constructing an initial (or free) category of some sort, which we can map uniquely into any other such category $\mathcal{C}$ to interpret proofs in terms of objects and morphisms in $\mathcal{C}$. An ordinary soundness theorem is just the special case of this when $\mathcal{C}$ is a category that we consider “the real world”, such as a category of sets. We might also want to have a dual “completeness theorem” that everything true is provable, in some sense. However, while those are all nice, without *at least* a simple soundness theorem I think it’s hard to justify calling something a “proof”.

Now, how do we prove a soundness theorem? In principle, I’m willing to be open-minded about this, but the only way I’ve ever seen to do it is *by induction*. That is, a formal proof is (or gives rise to something that is) inductively constructed by some collection of rules, and we prove soundness by proving that each of these rules “preserves truth”, so that when we put a bunch of them together into a proof, truth is still preserved all the way through.

For example, one common rule (called “or-elimination”) says that if we can prove “$A$ or $B$”, and assuming $A$ we can prove $C$, and also assuming $B$ we can prove $C$, then we can deduce $C$ without assuming anything extra. This rule is sound under the ordinary meaning of “or”, in the following sense: assuming inductively that the three premises are sound, we conclude that (1) “$A$ or $B$” is true, that (1) if $A$ is true then so is $C$, and that (3) if $B$ is true then so is $C$. By (1), it must be that either $A$ is true or $B$ is true; in the first case, we deduce that $C$ is true from (2); while in the second case we deduce that $C$ is true from (3). Thus, in all cases $C$ must be true. This is one of the “inductive steps” in a proof by “structural induction” that all proofs preserve truth, so that the formal system is sound.

(I hesitated before including this example at all, because it looks so tautological that one feels as if nothing is happening. The point is that it’s shifting the proof from the object-theory to the meta-theory: the rule of proof in the object-theory is sound because it represents a pattern of reasoning that’s correct in the meta-theory. Rest assured that it becomes much less trivial for more complicated systems like dependent type theory, and also when we want to interpret proofs into arbitrary categories.)

So in conclusion, it seems to me (at the moment) that any notion of “formal proof” worthy of the name must be (or include, or give rise to) some sort of inductively defined structure that we can use to prove a soundness theorem. In type theory, these inductively defined structures are called *derivation trees*.

Now, as I mentioned above, many formal systems also include some notion of “argument” that might or might not be a proof. Indeed, I’m tempted to claim that it’s *impossible* to avoid dealing with this at some point. Of course, ordinary mathematics written on paper is not a formal proof in any formal system; but even if we tried (totally infeasibly) to always write complete formal proofs on paper, there would always be the possibility (because “paper is untyped”) that we mis-applied a rule somewhere. One might say that formal proof is a mathematical abstraction that can exist essentially nowhere in reality.

The question of “type-checking” is about how we get from an argument to a proof. Of course, this depends on what kind of argument we are talking about! For instance, most type theories include a notion of “term” that plays the role of a kind of argument. Terms are, roughly, one-dimensional syntactic representations of derivation trees (or parts of them); but rather than directly being defined inductively as derivation trees are, the “well-typed terms” (those that represent derivation trees) are singled out from a larger class of “untyped terms”. Thus the latter are a sort of “argument” that might or might not give rise to a proof.

For instance, a proof by “or-elimination” as above would be represented by a term like $case(M,u.P,v.Q)$, where $M$ is a proof of “$A$ or $B$”, $P$ is a proof of $C$ that gets to use a hypothesis $u$ of $A$, and $Q$ is a proof of $C$ that gets to use a hypothesis of $B$. This is an expression of the sort that could be typed into a computer proof assistant; it’s formally analogous to arithmetic expressions like $x+(y*z)$ that you can write in a Java or Python program. But in addition to the problems of potential typos and misspellings, we can put together strings of symbols that “look like terms” but don’t actually represent proofs; for instance, maybe we write $case(M,u.P,v.Q)$ where $M$ is not actually a proof of “$A$ or $B$” for any $A$ and $B$ (maybe it is a proof of “if $A$ then $B$”), or where $P$ and $Q$ are not proofs of the same conclusion $C$. In a statically typed programming language, this sort of thing produces a compiler error; that’s basically the same sort of thing as the type-checking of a proof assistant.

The specific question “should type-checking be decidable” is about whether there should be an algorithm to which you can hand an untyped term and be guaranteed to get an answer of either “yes, this represents a derivation tree” or “no, it doesn’t” in a finite amount of time. In other words, your compiler can never hang; it must either succeed or return with an error message. But from the perspective I’m advocating here, “decidable checking” is not a fundamental property of a formal system, or more precisely not a property of the *proofs* in that system. Rather, it is a property of a certain class of “arguments” that *might or might not* represent proofs.

In particular, although many type theories have decidable type-checking for terms, essentially all *practical* type theories also include *other* kinds of “argument” that do not have “decidable checking”. For instance, practical implementations of dependent type theory (such as Coq, Agda, and Lean) never force the user to write out complete terms (let alone derivation trees); instead they have powerful “elaborators” that can fill in implicit arguments using techniques such as unification and typeclass inference. These elaborators are generally *not* guaranteed to terminate in general; for instance, it’s quite possible to set up a loop in typeclass inference causing Coq to hang, and “higher-order unification” is known formally to be an undecidable problem.

And this is even before we get to the informal arguments found in pencil-and-paper mathematics. Converting those into a formal proof of any sort is certainly undecidable by any ealgorithm!

So we certainly can’t do without “notions of argument” that don’t have “decidable type-checking”. But what I would ask of designers of new formal systems whether there is, somewhere in the specification of the system, a (probably inductive) notion of proof for which one can prove a soundness theorem (and, hopefully, also a categorical initiality theorem, and maybe a completeness theorem). If not, how do you justify calling the things you are talking about “proofs”?

Note that “type-checking” is trivially “decidable” for proofs of this sort; by their very nature they are correct proofs. The further our “arguments” get from the inductive formal proofs, the more difficult of a problem “type-checking” becomes, until in the limit we get to “I have found a truly marvelous proof of this result which this margin is too narrow to contain”. Somewhere in there is the boundary between decidable and undecidable type-checking. Somewhere else in there is the boundary between feasible and infeasible type-checking. And in practice, we certainly make use of “notions of argument” that lie on both sides of each of those lines.

However, it does seem to me that if a formal system is going to have at its core some inductive notion of proof, then for a proof assistant to honestly call itself an *implementation* of that formal system, it ought to include, somewhere in its internals, some data structure that represents those proofs reasonably faithfully. And given how trivial “type-checking” is for the actual formal proofs, it seems to me that anything calling itself a “reasonably faithful representation” of those proofs ought to at least have decidable type-checking. Those representations may not be what the user of the system calls “terms”; but they ought to be there somewhere. Informally, the purpose of a proof assistant is to *assist the user in producing a proof*; it may not actually go all the way to produce a complete inductive formal proof, but it ought to at least produce something close enough that the rest of the distance can be crossed algorithmically.

However, I remain open to being convinced otherwise.

**EDIT:** After a long and probably hard-to-follow discussion in the comments, I have been convinced otherwise (though I remain open to being convinced back). I still say that a proof assistant ought to somehow “faithfully represent formal proofs” internally. But now I have an actual mathematical/practical reason for that (not just a philosophical one), which implies a concrete criterion for what “faithfully represent” means: I want to be able to compute with the formal proofs that get created, such as by applying a constructive proof of the soundness/initiality theorem to them. With this criterion, it turns out that decidability is a red herring. What I want is *more* than decidability in one sense — we need the actual proofs, not just a decidably checkable representation of them — but also *less* in another sense, since we don’t need to actually represent the entire proof as a data structure, only “compute with that data structure” as if we had it. See this comment below and its responses for further discussion.

## Re: What is a Formal Proof?

I’m not sure what you mean by this sentence. I thought you were going to conclude the opposite position, or perhaps I can’t parse the triple negative correctly.