Skip to content

Latest commit

 

History

History
8878 lines (7860 loc) · 401 KB

ctlc.org

File metadata and controls

8878 lines (7860 loc) · 401 KB

Category Theory and Lambda Calculus

\clearpage\null\newpage

Hidden: Macros and libraries

Setup

Fonts

Colors

Hyperreferences

Libraries

Theorem styles

Math symbols and Unicode support

Categories

Topos

Type systems

Lambda calculus

Type theory

Overview

Chapter 1

Chapter 1: The λ-calculus is a collection of systems formalizing the notion of functions as formulae, function abstraction and function composition. They can be seen as programming languages and as formal logics at the same time. We start by analyzing the characteristics that make untyped lambda calculus (\S section-untypedcalculus) a proper programming language; these are the confluence property (known as the Church-Rosser theorem, \S section-confluence and section-churchrossertheorem), the existence of normal forms for terminating computations, and evaluation strategies that allow us to reliably find these normal forms in the cases where they exist. We also relate lambda calculus to combinatorial logic with SKI combinators (\S section-ski) and we discuss its historical importance on the definition of Turing-completeness (\S section-turing).

There also exist typed variants of lambda calculus (\S sec-simplytypedlambda). Simply typed lambda calculus, for instance, can include function types, a unit type, a void type, pairs and unions (\S sec-simpletypes). We focus on the logical interpretation of this system, called the Curry-Howard isomorphism (\S sec-curryhoward). Brouwer, Heyting and Kolmogorov proposed in the 30s a computational interpretation of intuitionistic logic, a logic that does not assume the law of excluded middle, $A ∨\neg A$; and simply typed lambda calculus precisely matches this interpretation, making it possible to have a language that is, at the same time, a functional programming language and a formal system.


Chapter 2

Chapter 2: We have developed Mikrokosmos, an untyped and simply typed λ-calculus interpreter written in the purely functional programming language Haskell cite:hudak07_haskell, (\S sec-haskelllang). It aims to provide students with a tool to learn and understand λ-calculus and the relation between logic and types. We show how to program and prove statements of intuitionistic propositional logic with it.

We discuss why Haskell and the use of monadic parser libraries are an adequate choice for this project (\S sec-monadicparsers). Our implementation relies on De Bruijn indexes to represent variable scope and bindings internally (\S sec-debruijnindexes). We need to implement substitutions (\S sec-substitution), evaluation (\S section-evaluation) and type inference (\S sec-typeinference); but all these algorithms can be directly derived from the theoretical discussion we had on Chapter 1. Our programming environment makes use of version control using Git, test suites using the Tasty testing framework (\S section-testing) and continuous integration using TravisCI (\S sec-versioncontrolci). We also rely on packaging tools that allow us to distribute the software as a library and as an executable on the appropriate channels (\S section-installation). A Jupyter Notebook integration is provided (\S section-jupyterkernel); it has been used to provide students with interactive tutorials of lambda calculus (\S section-jupyterhub). Along with the command-line interpreter, we also include a mechanism allowing us to call the interpreter from Javascript and embed it into a web page (\S section-javascript).

At the end of this chapter, we describe how to program on the untyped lambda calculus (\S sec-programming-untyped), encoding complex data structures with a technique due to Church and learning how to use fixpoint operators to achieve unbounded recursion. On the other hand, programming in the simply-typed lambda calculus (\S sec-programming-typed) more resembles proving statements on intuitionistic logic with a proof assistant. We show how each program is, itself, a proof of a proposition in intuitionistic logic; their types can be seen as the propositions they are proving; and evaluation and $β\mathrm{-reduction}$ are related to proof simplification.


Chapter 3

Chapter 3: Categories (\S section-categories) will be the framework we will use to study the fundamental notion of function composition inside mathematics. They provide a fruitful language to talk about theories, mathematics and logic. Notions such as naturality, universality or duality can be given a explicit formal meaning with categories. We have examples of them all around mathematics. Functors (\S sec-functorsnatural) can be seen as homomorphisms that preserve the structure of a category, and we can base many different useful mathematical constructions on them, with the added benefit that they can be formulated in a very abstract setting (\S section-constructions).

Yoneda Lemma (\S section-universality) is one of the foundational results in category theory; intuitively, it says that every object in a category is determined by how it relates to any object of the category. This is a tool that will be necessary when constructing categorical models of algebraic theories in Chapter 4. We also define limits and colimits, which allow us more expressive constructions inside categories; in particular, fibered products and coproducts can be expressed this way (pullbacks and pushouts), and they will be also useful when interpreting syntactic substitution inside categories.

Finally, we present monads, algebras and adjoint functors (\S section-adjoints-monads). Algebras are useful to encode induction in a purely categorical way. Adjoint functors, on the other hand, will be one of the key insights we will use to understand and expand the connection between computation and logic we have described so far. This section is primarily based on cite:maclane78.


Chapter 4

Chapter 4: The internal logic of categories is studied. We start by defining presheaf categories (\S section-presheaves). Presheaves can be thought as some kind of generalized sets. Pair of sets, graphs, or sets depending on a real variable can be seen as presheaves. As we will see in this same chapter, presheaf categories are examples of categories having all the structure we are interested in. Lawvere’s algebraic theories show that it is possible to give a description of algebraic structures in terms of categories (\S sec-lawveretheories). Moreover, this description is independent of the particular axiomatization for the structure we are using. Models for these theories in any mathematical structure can be seen as functors from the category representing the theory.

In particular, cartesian closed categories (\S sec-cartesianclosedlambdacalculus) can be used to model simply typed lambda calculus. Each lambda calculus theory is the internal logical language of a cartesian closed category. Bicartesian categories correspond to intuitionistic logic and we can model inductive types as algebras over certain functors. Using this correspondence, we can encode theorems and proofs about cartesian closed categories using lambda calculus. As an example, we prove a theorem by Lawvere on fixed points and diagonal arguments (\S section-diagonalarguments). We can then interpret this theorem in many categories to get many known results by Cantor, Russel, Tarski and Gödel as corollaries.

We extend this structure asking the same structure to exist in each slice of the category and thus defining locally cartesian closed categories (\S sec-locallydependent). Existential and universal quantifier can be interpreted with this new structure, which also provides a framework for higher-order logics. Dependent type theory arises as the natural language for these richer categories, and it carries a natural computational interpretation. In particular, equality (\S sec-equalitytypes) can be encoded using the J-eliminator, a rule, directly derived from a categorical construction that says that, if $P(x,x)$ is true for any element $x$ and $a = b$, then $P(a,b)$ is true. We also discuss a categorical encoding of the notion of propositions with proof irrelevance and propositional truncation (\S sec-subobject-propositions and sec-proptrunc); this is a tool that allows us to write proofs that do not depend on a specific proof being constructed despite the fact that we remain in a constructive setting. The theorem of choice is shown as an example (\S sec-examples-dependent-types); it is a variant of the Axiom of Choice that is actually provable within the theory we obtain this way.

Under this interpretation, categories, logic and computation can be seen as multiple aspects of a unique central notion. All the constructions and definitions we have been discussing until this moment nudge us towards considering topoi, a particular kind of category that is itself a model of constructive mathematics (\S section-topoi). In other words, each topos is a different universe of mathematics. In particular, we study Lawvere’s Elementary Theory of the Category of Sets and how it recovers classical set-based mathematics. This section is primarily based on the work by cite:martinlof75 and cite:seely84.


Chapter 5

Chapter 5: As we have been discussing on previous chapters, the internal logic of categories with enough structure is expressive enough to allow us to develop mathematical theories inside them. Moreover, these theories can be expressed in a programming language with a strong dependent type system (\S sec-mltt). Thus, type theory provides both a foundation of programming language theory and a foundation of mathematics. It is now interesting to consider how to embed Lawvere’s axioms for the category of sets and some classical principles inside this constructive logic, thereby showing that a set theoretical foundation can be seen as a particular case. The proof we present of Diaconescu’s theorem is an example of the use of this embedding. In particular, we discover that the law of excluded middle and classical mathematics follow from accepting the Axiom of Choice. We also study a first result on Homotopy Type Theory (\S sec-hott), which is the theory we obtain when we consider the higher dimensional structure of the type of equality and we relate it to Grothendieck’s homotopy hypothesis using Voevodsky’s Univalence Axiom. This section is mainly based on cite:hottbook.

Two libraries of computer verified mathematics have been developed (\S sec-verified). The first one implements positive real numbers as Dedekind cuts and provides a proof of Diaconescu’s theorem. More than a hundred computer verified lemmas and propositions about natural numbers and dyadic rationals have been needed for this construction. The second one accepts Voevodsky’s Univalence Axiom and proves the fact that the fundamental group of the circle is given by the integers.


Keywords

Keywords: category theory, lambda calculus, categorical semantics, type theory, constructivism.

Sinopsis

Capítulo 1

Capítulo 1: Empezamos estudiando el cálculo lambda de Church como lenguaje de programación. Detallamos una definición formal de sus aspectos sintácticos y exponemos demostraciones de las propiedades que lo hacen implementable como lenguaje de programación; ejemplos son la propiedad de confluencia (Teorema de Church-Rosser) o la existencia de estrategias de reducción óptimas desde la perspectiva de encontrar una forma normal. Tenemos la intención de usar en la práctica estos resultados para posteriormente diseñar el intérprete de un lenguaje basado en cálculo lambda. Brevemente hacemos mención a la equivalencia con la lógica de combinadores SKI y cómo, cuando discutamos la implementación del operador de minimización de las funciones $μ\mathrm{-recursivas}$ de Gödel, estaremos comprobando su Turing-completitud.

Existen variantes tipadas del cálculo lambda, más orientadas a la lógica, que resuelven los problemas de normalización de las versiones sin tipos. Discutimos la diferencia entre tipado de Curry y tipado de Church y cómo el primero nos servirá para proveer de una forma de polimorfismo implícito a un lenguaje de tipos simples; esto es, podremos tener expresiones cuyo tipo sea lo más general posible hasta el mismo momento de instanciarlas. Seguidamente, prestamos particular interés a la relación de estas variantes tipadas del cálculo lambda con la lógica proposicional intuicionista representada en la deducción natural de Gentzen. La lógica intuicionista es base de la matemática de Brouwer o Bishop y se caracteriza por no asumir la ley del tercio excluso, $A ∨\neg A$, y tener una interpretación computacional natural. Esta relación nos permite dar un uso más del cálculo lambda, en este caso para usarlo como asistente de demostraciones. Esto constituye la interpretación de Brouwer-Heyting-Kolmogorov y correspondencia de Curry-Howard.

Terminamos revisando la literatura de posibles extensiones a los sistemas de tipos del cálculo lambda con el $λ\mathrm{-cubo}$ de Barendregt.


Capítulo 2

Capítulo 2: Diseñamos un intérprete de dos lenguajes de programación que basamos respectivamente en el cálculo lambda sin tipos y en el cálculo lambda simplemente tipado. La implementación deberá usar internamente índices de DeBruijn, una notación formal para representar variables, y aplicar algoritmos de evaluación e inferencia de tipos directamente derivados de nuestro estudio en el capítulo anterior. Elegimos el lenguaje funcional puro Haskell como lenguaje de desarrollo y analizamos esta elección. Utilizaremos librerías de parseado monádico y un entorno de herramientas de integración y control orientadas a garantizar corrección y finalmente a distribuir el paquete de software en los canales propios del lenguaje. Este intérprete nos permitirá explorar técnicas de programación en el cálculo lambda y aplicarlas a la verificación formal de demostraciones en la lógica intuicionista. Se incluyen además utilidades para integrarlo en el entorno de computación interactiva Jupyter Notebook; así como una transpilación y adaptación a Javascript que lo hace apto para la docencia y para ser embebido en páginas web. Siguiendo esta idea, se proporcionan enlaces a tutoriales interactivos sobre cálculo lambda.

En la sección sobre programación en cálculo lambda destacamos la posibilidad de codificar estructuras complejas de datos con la codificación de Church y la potencia del operador de punto fijo, que es el que nos permitirá escribir definiciones recursivas no acotadas a priori. En la sección sobre cálculo lambda tipado escribimos expresiones que son al mismo tiempo demostraciones de teoremas de la lógica intuicionista y programas tipados.


Capítulo 3

Capítulo 3: La teoría de categorías proporciona una abstracción del concepto de composición en matemáticas y una formalización de nociones como naturalidad, dualidad o universalidad. Esencialmente, una categoría viene dada por objetos y flechas o morfismos componibles entre ellos, siendo una generalización de la idea de grupoide. Tenemos ejemplos de categorías dentro de una gran variedad de áreas de las matemáticas y podemos estudiar los funtores como los homomorfismos que respetan su estructura; a su vez, estos funtores sirven como traducciones entre esas teorías. Realizaremos varias construcciones que justifican el hablar de naturalidad y universalidad de las estructuras matemáticas. El lema de Yoneda es un resultado fundacional de categorías que afirma, intuitivamente, que los objetos quedan determinados por sus morfismos; sobre él se basarán varios desarrollos en los siguientes capítulos. Definimos asimismo límites y colímites desde la noción de universalidad, que nos permitirán la construcción de generalizaciones del producto y el coproducto; tenemos especial interés en el caso de productos fibrados, que corresponderán después, en teoría de tipos, a la substitución sintática.

Terminamos trabajando con adjunciones, usando ideas de Lawvere en cuanto a su relación con las inferencias lógicas. Todos estos desarrollos sirven de fundamento teórico a los próximos capítulos. Por ejemplo, las álgebras sobre funtores modelarán la recursión en los lenguajes de programación, las mónadas nos han servido ya para modelar el parseado en Haskell, y las adjunciones son la base de la lógica categórica.


Capítulo 4

Capítulo 4: Empezamos definiendo las categorías de prehaces, que serán modelos naturales de toda la discusión posterior; esta es una clase de categorías que generaliza la categoría de conjuntos y proporciona modelos no estándar de las mismas construcciones que estudiaríamos en conjuntos. La perspectiva de la teoría de categorías y en especial la noción de las adjunciones como elemento fundacional nos permiten primero dar una formalización de la estructura de las teorías algebraicas, que muestra las ideas que usaremos para dotar de semántica a otras estructuras sintáticas. En esta teoría, podremos dar varias axiomatizaciones de una teoría, como la teoría de grupos, y obtener con todas ellas una sola categoría cuyos funtores representarán cada uno un modelo de la teoría.

Usamos esta misma idea para formalizar la estructura lógica del cálculo lambda como el lenguaje interno de las categorías cartesianas cerradas. Usando esta correspondencia, podemos recrear demostraciones sobre categorías cartesianas usando expresiones del cálculo lambda. En particular, aplicamos esta técnica a un teorema de William Lawvere (1969) sobre puntos fijos en categorías cartesianas. Su particularización nos da como corolario demostraciones de resultados de Cantor, Russell, Tarski y Gödel, y sintetiza la idea común a todas las demostraciones por diagonalización. Las categorías bicartesianas suponen una primera extensión que nos proporciona una estructura de lógica intuicionista interna a la categoría, y comprobamos cómo la existencia de álgebras iniciales en una categoría nos permite usarla como modelo de las variantes del cálculo lambda que contienen tipos inductivos (como los números naturales).

Una vez comprobada esta correspondencia, buscamos ampliarla hacia una lógica con cuantificadores y de orden superior que nos permita expresar nociones matemáticas más complejas. Enriquecemos la estructura tomando categorías localmente cartesianas cerradas como modelos e inspirando en ellas la noción de tipos dependientes. La estructura de categoría localmente cartesiana cerrada permite sustituciones dentro de los tipos y la expresión de tipos que actúan como proposiciones sobre elementos y otros tipos. Estos tipos dependientes nos proveen un marco lógico que además conserva una interpretación computacional natural. Tenemos así un lenguaje de programación cuyos tipos pueden ser proposiciones y cuyos elementos pueden ser demostraciones en un sistema formal. Los cuantificadores existencial y universal aparecen como casos particulares de adjunciones. La igualdad puede ser codificada usando el eliminador J, que permite demostrar una proposición en dos elementos, $P(a,b)$, probando $P(x,x)$ para todo $x$ y probando $a = b$; esta codificación se extrae directamente de una construcción categórica. Discutimos también la noción de truncamiento proposicional, que nos permite ganar expresividad matemática, ya que nos permite demostrar proposiciones sin establecer explícitamente qué elemento las cumple en una teoría que sería, a priori, constructivista. El teorema de elección, una variante del axioma de elección, es un ejemplo de este fenómeno.

Categorías, lógica y computación aparecen bajo esta interpretación como tres manifestaciones de una misma noción. Terminamos intuyendo cómo todas las construcciones hasta el momento nos dirigen naturalmente a la estructura algebraica de topos, una categoría que puede contener en su lógica interna formulaciones completas de teorías matemáticas. La teoría de conjuntos de Lawvere es el ejemplo que proponemos para esta correspondencia.


Capítulo 5

Capítulo 5: Como hemos detallado hasta aquí, la lógica interna de ciertas categorías es suficientemente rica como para permitirnos desarrollar teorías matemáticas que además pueden expresarse en un lenguaje de programación con un sistema de tipos suficientemente fuerte, sirviendo su compilación como comprobación de corrección de la demostración. Para demostrar el potencial de esta perspectiva, usamos el lenguaje de programación Agda y detallamos las correspondencias explícitas entre tipos y categorías. La teoría de conjuntos de Lawvere, unida al teorema de Diaconescu, nos permite recuperar la matemática clásica como caso particular; sin embargo, permanecemos en el caso más general posible para desarrollar una definición de los reales positivos por cortes de Dedekind en la teoría de tipos de Martin-Löf. Como resultado colateral, algunas demostraciones sobre ciertas instancias del tercio excluso en esta estructura se convierten en programas que calculan reales con precisión arbitraria. Proponemos asimismo un ejemplo de topología sintética usando una correspondencia recientemente descubierta entre la estructura de los tipos, los $∞\mathrm{-grupoides}$ y la homotopía, que se hace patente una vez incluimos el axioma de Univalencia de Voevodsky.


Referencias principales, palabras clave

Nuestras referencias principales son la exposición del cálculo lambda de Selinger cite:selinger13 en la primera parte, los textos de teoría de categorías de MacLane cite:maclane78 y Awodey cite:awodey10 en la segunda, y el libro fundacional de la teoría homotópica de tipos cite:hottbook en la parte final. Es destacable la utilidad de proyecto nLab cite:nlab para localizar la bibliografía relevante entre temas a priori dispares.

Palabras clave: categorías, cálculo lambda, lógica categórica, teoría de tipos, constructivismo.

Quote

Quote: Lawvere

\centering \raggedleft What is the primary tool for such summing up of
the essence of ongoing mathematics? Algebra! \ Nodal points in the progress of this kind of \ research occur when, as in the case with the \ finite number of axioms for the metacategory \ of categories, all that we know so far can be \ expressed in a single sort of algebra.

F. William Lawvere

Quote: McBride

\centering \raggedleft Mathematics has serious cultural issues.
Too much fame for climbing mountains; \ not enough for flattening them.

Conor McBride,

Introduction

#

Assuming $A ∨ ¬ A$ for any arbitrary proposition $A$ may lead to an unnecessarily narrow view of mathematics and its relationship with computation. This principle, called the law of excluded middle, was the subject of much debate during the beginning of the 20th century cite:martinlof08. Constructive mathematics is mathematics done without assuming excluded middle: Brouwer’s intuitionism cite:brouwer07 and Bishop’s constructive analysis cite:bishop67foundations are examples of this school of thought (see cite:troelstra2014). However, constructivism did never attract the interest of the majority of mathematicians during the foundational crisis.

Some years after, in 1945, the notion of categories originated with Eilenberg and Mac Lane’s work on cohomology theory cite:eilenberg45. Nowadays, functors, natural transformations and universal properties are used in all the different branches of mathematics. While we generally still prefer to rely on set-theoretical foundations, many axiomatizations of mathematics based on category theory have been proposed (see, for instance, cite:lawvere64). It was observed that any category with enough structure is able to interpret most of mathematics in an internal logic; and remarkably, this internal logic does not satisfy in general the law of excluded middle.

How does all of this relate to programming? Suppose we want to design a programming language. A possible path is to follow the functional paradigm that revolves around composition and inductive data structures. We would borrow ideas from λ-calculus, a collection of formal systems Alonzo Church proposed in the 1930s cite:church36. They can be seen as models of computation, and many modern programming languages are heavily influenced by them cite:hudak07_haskell, specially in the typed variants of the calculus. But then, we have the realizability interpretation of logic without excluded middle by Heyting and Kolmogorov (see cite:troelstra2014), that regards proofs as programs and propositions as types. Under this interpretation, the simply typed lambda calculus is precisely a language for constructive logic that loses its computational nature when excluded middle is added.

Categories are the framework in which we can make sense of this. A cartesian closed category is a category that has exponential objects (whose elements can be regarded as functions) and all finite products. These two constructions are particular cases of adjunctions, a purely categorical description of how certain transformations in mathematics relate (see cite:lawvere1969adjointness). It is easy to see how these adjunctions translate into computations in lambda calculus, thus encoding a language for logic and programming inside categories.

The logic we first obtain this way is a propositional logic. This identification suffices to prove some theorems that have many known results as corollaries, such as the Lawvere’s diagonal theorem (cite:lawvere06): a surjective morphism $g \colon A → B^A$ implies the existence of a fixed point for each $f \colon B → B$. However, we want to extend this logic with universal and existential quantifiers, and locally closed cartesian categories are the tool for this task. They are categories in which every slice generated by the morphisms to a particular object has the cartesian closed structure. Quantifiers are adjoints on this setting, and they provide, inside the type theoretical interpretation, dependent pairs, written with $Σ$, and dependent functions, written with $Π$. These correspond to the existential and universal quantifier respectively and they define types that can depend on elements of other types. Now we can prove more complex results, and the example we study is Diaconescu’s theorem: the Axiom of Choice implies the law of excluded middle.

Finally, Martin-Löf type theory (see cite:martinlof75 and cite:nordstrom90) helps to synthesize all these ideas into a formal system that is active subject of research (cite:kraus2016notions, cite:rabe2011kripke or cite:booij2017parametricity are examples). This is the environment in which Voevodsky’s Univalence Axiom gives rise to Homotopy Type Theory (see cite:hottbook), a recent development relating type theory and topological constructions. As an example, we will prove that the fundamental group of the circle is $\mathbb{Z}$ with a computer verified mathematics library.

Our first objective is to build an interpreter for untyped and simply typed lambda calculus. Results on the first section of this text provide solid theoretical grounds for this purpose. We want the interpreter to be a didactic tool, useful for experimentation and with a clear theoretical basis in which to interpret the constructions and programs we can write on it. Our second main objective is to understand how dependent type theory works internally and to develop libraries of computer verified mathematics, proving simple theorems, undertaking the project of constructing the real numbers from scratch and formulating homotopy types within this structure.

Lambda calculus

Untyped λ-calculus

<<sec-untypedlambda>>

Introduction

When are two functions equal? Classically in mathematics, functions are graphs. A function from a domain to a codomain, $f \colon X → Y$, is seen as a subset of the product space: $f ⊂ X × Y$. Any two functions are identical if they map equal inputs to equal outputs; and a function is completely determined by what its outputs are under different inputs. This vision is called /extensional/.

From a computational point of view, this perspective could seem incomplete in some cases; we usually care not only about the result but, crucially, about how it can be computed. Classically in computer science, functions are formulae; and two functions mapping equal inputs to equal outputs need not to be equal. For instance, two sorting algorithms can have different efficiency or different memory requisites, even if they output the same sorted list. This vision, where two functions are equal if and only if they are given by essentially the same formula, is called /intensional/ (see cite:selinger13, where this difference is detailed).

The λ-calculus is a collection of formal systems, all of them based on the lambda notation introduced by Alonzo Church in the 1930s while trying to develop a foundational notion of functions as formulae on mathematics. It is a logical theory of functions, where application and abstraction are primitive notions, and at the same time it is also one of the simplest programming languages, in which many other full-fledged languages are based.

The untyped or pure λ-calculus is syntactically the simplest of these formal systems. In it, a function does not need a domain nor a codomain; every function is a formula that can be directly applied to any expression. It even allows functions to be applied to themselves, a notion that would be troublesome in our usual set-theoretical foundations. In particular, if $f$ were a member of its own domain, the infinite descending sequence \[ f ∋ \{f,f(f)\} ∋ f ∋ \{f,f(f)\} ∋ …, \] would exist, thus contradicting the regularity axiom of Zermelo-Fraenkel set theory (see, for example, cite:kunen11). In contrast, untyped λ-calculus presents some problems that would never appear in the usual foundations, such as non-terminating functions. An approach to solving these is presented in Section sec-simplytypedlambda.

This presentation of the untyped lambda calculus will follow cite:Hindley08 and cite:selinger13.

Untyped λ-calculus

<<section-untypedcalculus>>

As a formal language

As a formal language, the untyped λ-calculus is given by a set of equations between expressions called λ-terms; equivalences between them can be computed using some manipulation rules. These λ-terms can stand for functions or arguments indistinctly: they all use the same λ-notation in order to define function abstractions and applications.

This λ-notation allows a function to be written and inlined as any other element of the language, identifying it with the formula it determines and admitting a more succinct representation. For example, the polynomial function $p(x) = x^2 + x$ is written in λ-calculus as $λ x.\ x^2 + x$; and the particular evaluation $p(2)$ is written as $(λ x.\ x^2+x)(2)$. In general, $λ x.M$ is a function taking $x$ as an argument and returning $M$, a term in which $x$ may appear as a symbolic variable.

The use of λ-notation also eases the writing of higher-order functions, functions whose arguments or outputs are functions themselves. For instance, $λ f.(λ y.f(f(y)))$ would be a function taking $f$ as an argument and returning $λ y.f(f(y))$, which is itself a function most commonly written as $f ˆ f$. In particular, the following expression \[ \Big( \big( λ f.(λ y.f(f(y))) \big) \big( λ x.x^2 + x \big) \Big) (1) \] evaluates to $6$. It can be read as applying the polynomial $x^2+x$ twice to the initial argument that $1$ represents.

Lambda terms

By convention, we omit outermost parentheses and assume left-associativity. For example, $MNP$ will always mean $(MN)P$. Note that application of λ-terms is not the same as composition of functions, which is associative. We also consider λ-abstraction as having the lowest precedence. For example, $λ x. M N$ should be read as $λ x.(MN)$ instead of $(λ x.M) N$.

Free and bound variables, substitution

<<sec-freeboundvars>> In λ-calculus, the scope of a variable restricts to the λ-abstraction where it appears, if any. Thus, the same variable can be used multiple times on the same term independently. For example, in $(λ x.x)(λ x.x)$, the variable $x$ appears twice with two different meanings.

Evaluation in λ-calculus relies in the notion of /substitution/. Any free ocurrence of a variable can be substituted by a term, as we do when evaluating function applications. For instance, in the previous example, we can evaluate $(λ x.\ x^2+x)(2)$ into $6$ by substituting $2$ in the place of $x$ inside $x2 + x$; as in \[\begin{tikzcd} (λ x.\ x^2+x)(2) \rar{x \mapsto 2} & 22 + 2. \end{tikzcd}\] This, however, should be done avoiding the unintended binding that happens when a variable is substituted inside the scope of a binder with the same name, as in the following example: if we were to evaluate the expression $(λ x.y x)(λ z.xz)$, where $x$ appears two times (once bound and once free), we should substitute $y$ by $(λ z.xz)$ on $(λ x.yx)$ and $x$ (the free variable) would get tied to $x$ (the bounded variable) \[\begin{tikzcd} (λ y.λ x.yx)(λ z.xz) \ar{rr}{y \mapsto (λ z.xz)} && (λ x.(λ z.xz)x). \end{tikzcd}\]

To avoid this, the bounded $x$ must be given a new name before the substitution, which must be carried as follows, keeping $x$ free, \[\begin{tikzcd} (λ y. λ u.y u)(λ z.\ xz) \ar{rr}{y \mapsto (λ z.xz)} & & (λ u.(λ z.xz)u). \end{tikzcd}\]

We could define a criterion for choosing exactly what this new variable should be, or simply accept that this procedure is not well-defined, but only well-defined up to a change on the name of the variables. This equivalence relation between terms with the same structure but different variable names is defined formally on the next section. In practice, it is common to follow the Barendregt’s variable convention, which simply assumes that bound variables have been renamed to be distinct.

Alpha equivalence

Variables are only placeholders; and its name, as we have seen before, is not relevant. Two λ-terms whose only difference is the naming of the variables are called α-equivalent. For example, $(λ x.λ y. x\ y)$ is α-equivalent to $(λ f.λ x. f\ x)$.

The relation of α-equivalence formally captures the fact that the name of a bound variable can be changed without changing the meaning of the term. This idea appears repeatedly on mathematics; for example, the renaming of variables of integration or the variable on a limit are a examples of α-equivalence. \[ ∫_0^1 x^2\ dx = ∫_0^1 y^2\ dy; \qquad limx → ∞ \frac{1}{x} = limy → ∞ \frac{1}{y}. \]

Beta reduction

The core notion of evaluation in λ-calculus is captured by the idea of β-reduction. Until now, evaluation has been only informally described; it is time to define it as a relation, $\ttoβ$, going from the initial term to any of its partial evaluations. We consider first a one-step reduction relationship, called $→β$, and we extend it later by transitivity to $\ttoβ$.

Ideally, we would like to define evaluation as a series of reductions into a canonical form which could not be further reduced. Unfortunately, as we will see later, it is not possible to find that canonical form in general.

Eta reduction

Although we lost the extensional view of functions when we decided to adopt the functions as formulae perspective, some notion of function extensionality in λ-calculus can be partially recovered by the notion of η-reduction: any term which simply applies a function to the argument it takes can be reduced to that function. That is, given any term $M$, the abstraction $λ x.M x$ can be reduced to $M$.

Confluence

<<section-confluence>> It is not possible in general to evaluate a λ-term into a canonical, non-reducible term. We discuss many examples of this phenomenon in the following sections. However, we will be able to prove that, in the cases where it exists, it is unique. This property is a consequence of a sightly more general one called /confluence/, which can be defined in any abstract rewriting system.

Given any binary relation $→$ of which $\tto$ is its reflexive transitive closure, we can consider three related properties:

  • the confluence property (also called Church-Rosser property) we have just defined;
  • the quasidiamond property, similar to the confluence property but assuming $M → N$ and $M → P$ instead of the weaker hypothesis of $M \tto N$ and $M \tto P$;
  • and the diamond property, which is defined by substituting $\tto$ by $→$ in the definition of confluence.

The three properties can be represented respectively as follows (left: confluence, center: quasidiamond property, right: diamond property). \[\begin{tikzcd}[column sep=small] & M \drar[two heads]\dlar[two heads] &&& M \drar\dlar &&& M \drar\dlar &
N \drar[dashed,two heads] && P \dlar[dashed,two heads] & N \drar[dashed,two heads] && P \dlar[dashed,two heads] & N \drar[dashed] && P \dlar[dashed] \& Z &&& Z &&& Z &\ \end{tikzcd}\]

We can show that the diamond property implies confluence; while the quasidiamond property does not. In fact, the following figure provides a relation satisfying the quasidiamond property but not the confluence property (example from cite:selinger13). If we want to prove confluence for a given relation, we must use the diamond property instead of the quasidiamond property. \[\begin{tikzcd}[column sep=tiny, row sep=tiny] & & • \drar\dlar & & &
& • \dlar & & • \dlar\drar & & \ •\drar & & • \drar\dlar & & • \drar & \ & •\dlar & & • \drar\dlar & & •\dlar \ •\drar & & • \drar\dlar & & • \drar & \ & •\dlar & & • \drar\dlar & & •\dlar \ \makebox[0pt][l]{…}\phantom{•} & & \makebox[0pt][l]{…}\phantom{•} & & \makebox[0pt][l]{…}\phantom{•} & \ \end{tikzcd}\]

The statement of $\ttoβ$ and $\ttoβ\eta$ being confluent is what we call the /Church-Rosser Theorem/. The definition of a relation satisfying the diamond property and whose reflexive transitive closure is $\ttoβ\eta$ will be the core of our proof.

The Church-Rosser theorem

<<section-churchrossertheorem>> The proof presented here is due to Tait and Per Martin-Löf; an earlier but more convoluted proof was discovered by Alonzo Church and Barkley Rosser in 1935 (see cite:barendregt84 and cite:pollack95). It is based on the idea of parallel one-step reduction.

Parallel one-step reduction

Transitive reflexive closure of parallel one-step reduction

Substitution lemma

In order to prove that this newly defined relation has the diamond property, we will define a reduction of a term with the property that it can be reached from any of its parallel one-step reductions. We first prove a lemma on substitution that will handle later the more challenging cases of the proof.

Maximal parallel one-step reduct

Diamond property of parallel reduction

Church-Rosser theorem

Normalization

Once the Church-Rosser theorem is proved, we can formally define the notion of a normal form as a completely reduced lambda term.

Fully evaluating λ-terms means to iteratively apply reductions to them until a normal form is reached. We know, by virtue of Theorem theorem-churchrosser, that if a normal form for a particular term exists, then it is unique; but we do not know whether a normal form actually exists. We say that a term has a normal form if it can be reduced to a normal form.

A consequence of Theorem theorem-churchrosser is that a weakly normalizing term has a unique normal form. Strong normalization implies weak normalization, but the converse is not true; as an example, the term $Ω = (λ x.(x x))(λ x.(x x))$ is neither weakly nor strongly normalizing; and the term $(λ x.λ y.y)\ Ω\ (λ x.x)$ is weakly but not strongly normalizing. It can be reduced to a normal form as \[ (λ x.λ y.y)\ Ω\ (λ x.x) \longrightarrowβ (λ x.x). \]

Standardization and evaluation strategies

Motivation

We would like to find a β-reduction strategy such that, if a term has a normal form, it can be found by following that strategy. Our basic result will be the standardization theorem, which shows that, if a β-reduction to a normal form exists, then a sequence of β-reductions from left to right on the λ-expression will be able to find it. From this result, we will be able to prove that the reduction strategy that always reduces the leftmost β-abstraction will always find a normal form if it exists. This section follows cite:kashima00, cite:barendsen94 and cite:barendregt84.

Leftmost one-step reduction

Standard sequence

We will prove that every term that can be reduced to a normal form can be reduced to it using a standard sequence. This will imply the existence of an optimal beta reduction strategy that will always reach a normal form if one exists.

Standardization theorem

Leftmost reduction theorem

SKI combinators

<<section-ski>>

SKI definition

As we have seen in previous sections, untyped λ-calculus is already a very syntactically simple system; but it can be further reduced to a few λ-terms without losing its expressiveness. In particular, untyped λ-calculus can be essentially recovered from only two of its terms; these are

  • $S = λ x.λ y.λ z. xz(yz)$, and
  • $K = λ x.λ y.x$.

A language can be defined with these combinators and function application. Every λ-term can be translated to this language and recovered up to $=β\eta$ equivalence. For example, the identity λ-term, $I$, can be written as $I = λ x.x = SKK$.

It is common to also add the $I = λ x.x$ as a basic term to this language, even if it can be written in terms of $S$ and $K$, as a way to ease the writing of long complex terms. Terms written with these combinators are called SKI-terms.

The language of SKI-terms can be defined by the Backus-Naus form \[ \mathsf{SKI} ::= x \mid (\mathsf{SKI}\ \mathsf{SKI}) \mid S \mid K \mid I, \] where $x$ can represent any free variable.

Lambda transform

Bracket abstraction

SKI abstraction

In general this translation is not an isomorphism. For instance, $\lambdatrans(\skiabs(λ u. v u)) = \lambdatrans(v) = v$. However, the λ-terms can be essentially recovered if we relax equality between λ-terms to mean $=β\eta$.

Turing completeness

<<section-turing>>

Three different notions of computability were proposed in the 1930s,

  • the general recursive functions were defined by Herbrand and Gödel; they form a class of functions over the natural numbers closed under composition, recursion and unbounded search;
  • the λ-definable functions were proposed by Church; they are functions on the natural numbers that can be represented by λ-terms;
  • the Turing computable functions, proposed by Alan Turing as the functions that can be computed using Turing machines, theoretical models of a machine.

In cite:church36 and cite:turing37, Church and Turing proved the equivalence of the three definitions. This lead to the metatheoretical /Church-Turing thesis/, which postulated the equivalence between these models of computation and the intuitive notion of effective calculability mathematicians were using. In practice, this means that the λ-calculus, as a programming language, is as expressive as Turing machines; it can define every computable function. It is Turing-complete.

A complete implementation of untyped λ-calculus is discussed in Chapter Implementation of λ-expressions; and a detailed description on how to use the untyped λ-calculus as a programming language is given in Chapter Programming in untyped λ-calculus. General recursive functions, for example, can be encoded using these techniques, thus proving that it is in fact Turing complete (see sec-fixed-points). Note that the lambda calculus has even a cost model allowing us to develop complexity theory within it (see cite:dallago08). It is however beyond the scope of this text.

Simply typed λ-calculus

<<sec-simplytypedlambda>>

Motivation

/Types/ were introduced in mathematics as a response to the Russell’s paradox found in the first naive axiomatizations of set theory (see Corollary cor-russellparadox). An attempt to use untyped λ-calculus as a foundational logical system by Church suffered from a variant called the Rosser-Kleene paradox and types were a method to avoid it, as detailed in cite:kleene35 and cite:curry46. Once types are added to the calculus, a deep connection between λ-calculus and logic arises. This connection will be discussed in Section sec-curryhoward.

In programming languages, types indicate how the programmer intends to use the data, prevent errors and enforce certain invariants and levels of abstraction in programs. The role of types in λ-calculus when interpreted as a programming language closely matches the usual notion, and typed λ-calculus has been the basis of many modern type systems for programming languages.

Simply typed λ-calculus is a refinement of the untyped λ-calculus. On it, each term has a type that limits how it can be combined with other terms. Only a set of basic types and function types between any to types are considered in this system. Whereas functions in untyped λ-calculus can be applied over any term, once we introduce types, a function of type $A → B$ can only be applied over a term of type $A$ to produce a new term of type $B$. Note that $A$ and $B$ can be, themselves, function types.

We present now an account of simply typed λ-calculus based on cite:Hindley08. Our description will rely only on the arrow type constructor $→$. While other presentations of simply typed λ-calculus extend this definition with type constructors providing pairs or union types, as it is done in cite:selinger13, it is clearer to present first a minimal version of the λ-calculus. Such extensions will be explained later, and its exposition will profit from the logical interpretation that we develop in Section Propositions as types.

Simple types

<<sec-simpletypes>> We start by assuming a set of basic types. Those basic types would correspond, in a programming language interpretation, with the fundamental types of the language, such as the strings or the integers. Minimal presentations of λ-calculus tend to use only one basic type.

Typing rules for simply typed λ-calculus

Terms on simply typed lambda calculus

We define the terms of simply typed λ-calculus using the same constructors we used on the untyped version. The set of typed lambda terms is given by the following Backus-Naus form. \[\mathsf{Term} ::= x \mid \mathsf{Term}\ \mathsf{Term} \mid λ x\mathsf{Type}. \mathsf{Term}. \] The main difference here with Definition def-lambdaterms is that every bound variable has a type, and therefore, every λ-abstraction of the form $(λ x^A. m)$ can be applied only over terms type $A$; if $m$ is of type $B$, this term will be of type $A → B$.

However, the set of raw typed λ-terms contains some meaningless terms under this type interpretation, such as $(λ x^A. m)(λ x^A. m)$. In particular, we cannot apply a function of type $A → B$ to a term of type $A → B$; as it can only be applied to a term of type $A$. Typing rules will give terms this desired expressive power. Only a subset of the raw lambda terms can be obtained using typing rules, and we will choose to work only with this subset. When a particular term $m$ has type $A$, we write this relation as $m : A$. The $:$ symbol should be read as ”is of type”.

Typing rules

Every typing rule assumes a typing context, usually denoted by $Γ$. Concatenation of two typing contexts $Γ$ and $Δ$ is written with a comma, as in $Γ,Δ$; and the fact that $ψ$ follows from $Γ$ is written as $Γ \vdash ψ$. Typing rules are written as rules of inference; the premises are listed above and the conclusion is written below the line.

  1. The $(var)$ rule simply makes explicit the type of a variable from the context. That is, a context that assumes that $x : A$ can be written as $Γ,x:A$; and we can trivially deduce from it that $x:A$. \begin{prooftree} \RightLabel{($var$)} \AXC{} \UIC{$Γ, x:A \vdash x:A$} \end{prooftree}
  2. The $(abs)$ rule declares that the type of a λ-abstraction is the type of functions from the variable type to the result type. If a term $m:B$ can be built from the assumption that $x:A$, then $λ xA. m : A → B$. It acts as an introduction of function terms. \begin{prooftree} \RightLabel{$(abs)$} \AXC{$Γ, x:A \vdash m : B$} \UIC{$Γ \vdash λ x.m : A → B$} \end{prooftree}
  3. The $(app)$ rule declares the type of a well-typed application. A term $f : A → B$ applied to a term $a : A$ is a term $f\ a : B$. It acts as an elimination of function terms. \begin{prooftree} \RightLabel{$(app)$} \AXC{$Γ \vdash f : A → B$} \AXC{$Γ \vdash a : A$} \BIC{$Γ \vdash f\ a : B$} \end{prooftree}

A term $m$ is typeable in a giving context $Γ$ if a typing judgment of the form $Γ \vdash m : T$ can be derived using only the previous typing rules. From now on, we only consider typeable terms as the terms of simply typed λ-calculus: the set of λ-terms of simply typed λ-calculus is only a subset of the terms of untyped λ-calculus.

Examples of typeable and non-typeable terms

It can be seen that the typing derivation of a term somehow encodes the complete λ-term. If we were to derive the term bottom-up, there would be only one possible choice at each step on which rule to use. In Section Unification and type inference we will discuss a type inference algorithm that determines if a type is typeable and what its type should be, and we will make precise this intuition.

Curry-style types

<<sec-currystyle>>

Church-style and Curry-style

Two different approaches to typing in λ-calculus are commonly used.

  • Church-style typing, also known as explicit typing, originated from the work of Alonzo Church in cite:church40, where he described a simply-typed lambda calculus with two basic types. The term’s type is defined as an intrinsic property of the term; and the same term has to be always interpreted with the same type.
  • Curry-style typing, also known as implicit typing, creates a formalism where every single term can be given an infinite number of possible types. This technique is called polymorphism when it is a formal part of the language; but here, it is only used to allow us to build intermediate terms without having to directly specify their type.

As an example, we can consider the identity term $I = λ x.x$. It would have to be defined for each possible type. That is, we should consider a family of different identity terms $I_A = λ x.x : A → A$ for each type $A$. Curry-style typing allows us to consider type templates with type variables, and to type the identity as $I = λ x.x : σ → σ$ where $σ$ is a free type variable. The difference between the two typing styles is then not a mere notational convention, but a difference on the expressive power that we assign to each term.


Type templates

Assuming an infinite numerable set of type variables, we define type templates as inductively generated by $\TypeTemp ::= ι \mid \mathsf{Tvar} \mid \TypeTemp → \TypeTemp$, where $ι$ is a basic type and $\mathsf{TVar}$ is a type variable. That is, all basic types and type variables are atomic type templates; and we also consider the arrow type between two type templates. The interesting property of type variables is that they can act as placeholders and be substituted for other type templates.

Type substitution

We consider a type to be more general than other if the latter can be obtained by applying a substitution to the former. In this case, the latter is called an instance of the former. For example, $A → B$ is more general than its instance $(C → D) → B$, where $A$ has been substituted by $C → D$. A crucial property of simply typed λ-calculus is that every type has a most general type, called its principal type; this is proved in Theorem thm-typeinfer.

Principal type

Unification and type inference

Unification

The unification of two type templates is the construction of two substitutions making them equal as type templates; that is, the construction of a type that is a particular instance of both at the same time. We will not only aim for an unifier but for the most general one between them, the universal one.

A substitution $ψ$ is called an unifier of two sequences of type templates $A_1…,A_n$ and $B_1,…,B_n$ if $\overline{ψ} A_i = \overline{ψ} B_i$ for all $i = 1,…,n$. We say that it is the most general unifier if given any other unifier $φ$ exists a substitution $\varphi$ such that $φ = \overline{\varphi} ˆ ψ$.

Type Inference

Using unification, we can write an algorithm inferring types.

Subject reduction and normalization

Subject reduction

A crucial property is that type inference and β-reductions do not interfere with each other. A term can be β-reduced without changing its type.

Reducibility

We have seen previously that the term $Ω = (λ x.xx)(λ x.xx)$ is not weakly normalizing; but it is also non-typeable. In this section we will prove that, in fact, every typeable term is strongly normalizing. We start proving some lemmas about the notion of reducibility, which will lead us to the Strong Normalization Theorem. This proof will follow cite:girard89.

The notion of /reducibility/ is an abstract concept originally defined by Tait in cite:tait67 which we will use to ease this proof. It should not be confused with the notion of β-reduction. We inductively define the set $\redu_T$ of reducible terms of type $T$ for basic and arrow types.

  • If $t : T$ where $T$ is a basic type, $t ∈ \reduT$ if $t$ is strongly normalizable.
  • If $t : U → V$, an arrow type, $t ∈ \reduU → V$ if $t\ u ∈ \reduV$ for all $u ∈ \reduU$.

Properties of reducibility

We prove three properties of reducibility at the same time in order to use mutual induction.

Abstraction lemma

Strong normalization lemma

A final lemma is needed before the proof of the Strong Normalization Theorem. It is a generalization of the main theorem, useful because of the stronger induction hypothesis it provides.

Turing completeness of STLC

Every term normalizes in simply typed λ-calculus and every computation ends, therefore, simply typed λ-calculus cannot be Turing complete.

The Curry-Howard correspondence

<<sec-curryhoward>>

Extending the simply typed λ-calculus

<<sec-extendingstlc>> We will add now special syntax for some terms and types, such as pairs, unions and unit types. This new syntax will make our λ-calculus more expressive, but the unification and type inference algorithms will continue to work in a similar way. The previous proofs and algorithms can be extended to cover all the new cases.


Simple types II

The new set of simple types is given by the following Backus-Naur form, \[\mathsf{Type} ::= ι \mid \mathsf{Type} → \mathsf{Type} \mid \mathsf{Type} × \mathsf{Type} \mid \mathsf{Type} + \mathsf{Type} \mid 1 \mid 0,\] where $ι$ is any basic type. That is to say that, for any given types $A,B$, there exists a product type $A × B$, consisting of the pairs of elements where the first one is of type $A$ and the second one of type $B$; there exists the union type $A + B$, consisting of a disjoint union of tagged terms from $A$ or $B$; an unit type $1$ with only an element, and an empty or void type $0$ without inhabitants.


Raw typed lambda terms II

The new set of typed lambda terms is given by the following Backus-Naur form. \[\begin{aligned} \mathsf{Term} ::=\ & x \mid \mathsf{Term}\mathsf{Term} \mid λ x. \mathsf{Term} \mid \& \left\langle \mathsf{Term},\mathsf{Term} \right\rangle \mid π_1 \mathsf{Term} \mid π_2 \mathsf{Term} \mid \& \textrm{inl}\ \mathsf{Term} \mid \textrm{inr}\ \mathsf{Term} \mid \textrm{case}\ \mathsf{Term}\ \textrm{of}\ \mathsf{Term}; \mathsf{Term} \mid \& \textrm{abort}\ \mathsf{Term} \mid ∗. \end{aligned}\]

And the use of these new terms is formalized by the following extended set of typing rules.

  1. The $(var)$ rule simply makes explicit the type of a variable from the context. \begin{prooftree} \LeftLabel{($var$)} \AXC{} \UIC{$Γ, \red{x}:\blue{A} \vdash \red{x}:\blue{A}$} \end{prooftree}
  2. The $(abs)$ and $(app)$ rules construct and apply function terms. \begin{prooftree} \LeftLabel{$(abs)$} \AXC{$Γ, \red{x}:\blue{A} \vdash \red{m} : \blue{B}$} \UIC{$Γ \vdash \red{λ x.m} : \blue{A → B}$} \LeftLabel{$(app)$} \AXC{$Γ \vdash \red{f} : \blue{A → B}$} \AXC{$Γ \vdash \red{a} : \blue{A}$} \BIC{$Γ \vdash \red{f\ a} : \blue{B}$} \noLine \BIC{} \end{prooftree}
  3. The $(pair)$ rule constructs pairs of elements. The $(π_1)$ and $(π_2)$ rules destruct a pair into its projections. \begin{prooftree} \LeftLabel{$(pair)$} \AXC{$Γ \vdash \red{a} : \blue{A}$} \AXC{$Γ \vdash \red{b} : \blue{B}$} \BIC{$Γ \vdash \red{\pair{a,b}} : \blue{A × B}$} \LeftLabel{$(π_1)$} \AXC{$Γ \vdash \red{m} : \blue{A × B}$} \UIC{$Γ \vdash \red{π_1\ m} : \blue{A}$} \LeftLabel{$(π_2)$} \AXC{$Γ \vdash \red{m} : \blue{A × B}$} \UIC{$Γ \vdash \red{π_2\ m} : \blue{B}$} \noLine \TIC{} \end{prooftree}
  4. The $(inl)$ and $(inr)$ rules provide the two ways of creating a tagged union type, while the $(case)$ rule extracts a term from a union type applying case analysis. Note that we write $[a].n$ and $[b].p$ to explicitly indicate that $n$ and $p$ can depend on $a$ and $b$, respectively. \begin{prooftree} \LeftLabel{$(inl)$} \AXC{$Γ \vdash \red{a} : \blue{A}$} \UIC{$Γ \vdash \red{\mathrm{inl}\ a} : \blue{A + B}$} \LeftLabel{$(inr)$} \AXC{$Γ \vdash \red{b} : \blue{B}$} \UIC{$Γ \vdash \red{\mathrm{inr}\ b} : \blue{A + B}$} \noLine \BIC{} \end{prooftree} \begin{prooftree} \LeftLabel{$(case)$} \AXC{$Γ \vdash \red{m} : \blue{A + B}$} \AXC{$Γ, \red{a}:\blue{A} \vdash \red{n} : \blue{C}$} \AXC{$Γ, \red{b}:\blue{B} \vdash \red{p} : \blue{C}$} \TIC{$Γ \vdash (\red{\mathrm{case}\ m\ \mathrm{of}\ [a].n;\ [b].p}) : \blue{C}$} \end{prooftree}
  5. The $(∗)$ rule simply creates the only element of type $1$. \begin{prooftree} \LeftLabel{$(∗)$} \AXC{$$} \UIC{$Γ \vdash \red{∗} : \blue{1}$} \end{prooftree}
  6. The $(abort)$ rule extracts a term of any type from the void type. If we reach a void type, we have reached an error, and thus we can throw any typed exception. \begin{prooftree} \LeftLabel{$(abort)$} \AXC{$Γ \vdash \red{m} : \blue{0}$} \UIC{$Γ \vdash \red{\mathrm{abort}_A\ m} : \blue{A}$} \end{prooftree}

Beta-eta reductions in extended typed lambda calculus

The β-reduction of terms is defined the same way as for the untyped λ-calculus; except for the inclusion of β-rules governing the new terms, each for every new destruction rule.

  1. Function application, $(λ x.m)\ n →β m[n/x]$.
  2. First projection, $π_1 \left\langle m,n \right\rangle →β m$.
  3. Second projection, $π_2 \left\langle m,n \right\rangle →β n$.
  4. Case rule, $(\mathrm{case}\ m\ \mathrm{of}\ [x].n;\ [y].p) →β n[a/x]$ if $m$ is of the form $m = \mathrm{inl}\ a$; and $(\mathrm{case}\ m\ \mathrm{of}\ [x].n;\ [y].p) →β p[b/y]$ if $m$ is of the form $m = \mathrm{inr}\ b$.

On the other hand, new η-rules are defined, each for every new construction rule.

  1. Function extensionality: $λ x.f\ x →η f$ for any $f : A → B$.
  2. Definition of product: $\langle π_1\ m, π2\ m \rangle →η m$ for any $m : A × B$.
  3. Uniqueness of the unit: $t →η ∗$ for any $t : 1$.
  4. Case rule: $(\mathrm{case}\ m\ \mathrm{of}\ [a].p[ \mathrm{inl}\ a/c ];\ [b].p[ \mathrm{inr}\ b/c ]) →η p[m/c]$ for any $m : A + B$.

Natural deduction

<<sec-naturaldeduction>> The natural deduction is a logical system due to Gentzen. We introduce it here following cite:selinger13 and cite:wadler15. Its relationship with the simply-typed lambda calculus will be made explicit in Section Propositions as types.


Rules of natural deduction

We use the logical binary connectives $\blue{→},\blue{∧},\blue{∨}$, and two unary connectives, $\blue{\top}$ and $\blue{\bot}$, representing respectively the trivially true and false propositions. The rules defining natural deduction come in pairs; there are introducers and eliminators for every connective. Every introducer uses a set of assumptions to generate a formula and every eliminator gives a way to extract precisely that set of assumptions.

  1. Every axiom on the context can be used. \begin{prooftree} \RightLabel{(Ax)} \AXC{} \UIC{$Γ,\blue{A} \vdash \blue{A}$} \end{prooftree}
  2. Introduction and elimination of the $→$ connective. Note that the elimination rule corresponds to modus ponens and the introduction rule corresponds to the deduction theorem. \begin{prooftree} \RightLabel{($I$)} \AXC{$Γ, \blue{A} \vdash \blue{B}$} \UIC{$Γ \vdash \blue{A → B}$} \RightLabel{($E$)} \AXC{$Γ \vdash \blue{A → B}$} \AXC{$Γ \vdash \blue{A}$} \BIC{$Γ \vdash \blue{B}$} \noLine \BIC{} \end{prooftree}
  3. Introduction and elimination of the $∧$ connective. Note that the introduction in this case takes two assumptions, and there are two different elimination rules. \begin{prooftree} \RightLabel{($I$)} \AXC{$Γ \vdash \blue{A}$} \AXC{$Γ \vdash \blue{B}$} \BIC{$Γ \vdash \blue{A ∧ B}$} \RightLabel{($E^1$)} \AXC{$Γ \vdash \blue{A ∧ B}$} \UIC{$Γ \vdash \blue{A}$} \RightLabel{($E^2$)} \AXC{$Γ \vdash \blue{A ∧ B}$} \UIC{$Γ \vdash \blue{B}$} \noLine \TIC{} \end{prooftree}
  4. Introduction and elimination of the $∨$ connective. Here, we need two introduction rules to match the two assumptions we use on the eliminator. \begin{prooftree} \RightLabel{($I^1$)} \AXC{$Γ \vdash \blue{A}$} \UIC{$Γ \vdash \blue{A ∨ B}$} \RightLabel{($I^2$)} \AXC{$Γ \vdash \blue{B}$} \UIC{$Γ \vdash \blue{A ∨ B}$} \RightLabel{($E$)} \AXC{$Γ \vdash \blue{A ∨ B}$} \AXC{$Γ,\blue{A} \vdash \blue{C}$} \AXC{$Γ,\blue{B} \vdash \blue{C}$} \TIC{$Γ \vdash \blue{C}$} \noLine \TIC{} \end{prooftree}
  5. Introduction for $\top$. It needs no assumptions and, consequently, there is no elimination rule for it. \begin{prooftree} \RightLabel{($I\top$)} \AXC{} \UIC{$Γ \vdash \blue{\top}$} \end{prooftree}
  6. Elimination for $\bot$. It can be eliminated in all generality, and, consequently, there are no introduction rules for it. This elimination rule represents the “ex falsum quodlibet” principle that says that falsity implies anything. \begin{prooftree} \RightLabel{($E\bot$)} \AXC{$Γ \vdash \blue{\bot}$} \UIC{$Γ \vdash \blue{C}$} \end{prooftree}

Deduction trees

Proofs on natural deduction are written as deduction trees, and they can be simplified according to some simplification rules, which can be applied anywhere on the deduction tree. On these rules, a chain of dots represents any given part of the deduction tree.

  1. An implication and its antecedent can be simplified using the antecedent directly on the implication.

    \begin{prooftree} \AXC{$[\blue{A}]$}\noLine \UIC{$\vdots1$}\noLine \UIC{$\blue{B}$} \UIC{$\blue{A} → \blue{B}$} \AXC{$\vdots^2$}\noLine \UIC{$\blue{A}$} \BIC{$\blue{B}$}

    \UIC{$\vdots$}\noLine \AXC{$\Longrightarrow$} \UIC{}\noLine\UIC{}\noLine\UIC{}\noLine

    \AXC{$\vdots2$}\noLine \UIC{$\blue{A}$}\noLine \UIC{$\vdots1$}\noLine \UIC{$\blue{B}$} \UIC{$\vdots$}\noLine \noLine \TIC{} \end{prooftree}

  2. The introduction of an unused conjunction can be simplified as \begin{prooftree} \AXC{$\vdots1$}\noLine \UIC{$\blue{A}$} \AXC{$\vdots2$}\noLine \UIC{$\blue{B}$} \BIC{$\blue{A} ∧ \blue{B}$} \UIC{$\blue{A}$} \UIC{$\vdots$}\noLine \AXC{$\Longrightarrow$} \UIC{}\noLine\UIC{}\noLine\UIC{}\noLine \AXC{$\vdots1$}\noLine \UIC{$\blue{A}$} \UIC{$\vdots$}\noLine \noLine \TIC{} \end{prooftree}

    and, similarly, on the other side as follows. \begin{prooftree} \AXC{$\vdots1$}\noLine \UIC{$\blue{A}$} \AXC{$\vdots2$}\noLine \UIC{$\blue{B}$} \BIC{$\blue{A} ∧ \blue{B}$} \UIC{$\blue{B}$} \UIC{$\vdots$}\noLine \AXC{$\Longrightarrow$} \UIC{}\noLine\UIC{}\noLine\UIC{}\noLine \AXC{$\vdots2$}\noLine \UIC{$\blue{B}$} \UIC{$\vdots$}\noLine \noLine \TIC{} \end{prooftree}

  3. The introduction of a disjunction followed by its elimination can be also simplified

    \begin{prooftree} \AXC{$\vdots1$}\noLine \UIC{$\blue{A}$} \UIC{$\blue{A} ∨ \blue{B}$} \AXC{$[\blue{A}]$}\noLine \UIC{$\vdots^2$}\noLine \UIC{$\blue{C}$} \AXC{$[\blue{B}]$}\noLine \UIC{$\vdots^3$}\noLine \UIC{$\blue{C}$} \TIC{$\blue{C}$}

    \UIC{$\vdots$}\noLine \AXC{$\Longrightarrow$} \UIC{}\noLine\UIC{}\noLine\UIC{}\noLine

    \AXC{$\vdots1$}\noLine \UIC{$\blue{A}$}\noLine \UIC{$\vdots2$}\noLine \UIC{$\blue{C}$} \UIC{$\vdots$}\noLine \noLine \TIC{} \end{prooftree}

    and a similar pattern is used on the other side of the disjunction.

    \begin{prooftree} \AXC{$\vdots1$}\noLine \UIC{$\blue{B}$} \UIC{$\blue{A} ∨ \blue{B}$} \AXC{$[\blue{A}]$}\noLine \UIC{$\vdots^2$}\noLine \UIC{$\blue{C}$} \AXC{$[\blue{B}]$}\noLine \UIC{$\vdots^3$}\noLine \UIC{$\blue{C}$} \TIC{$\blue{C}$}

    \UIC{$\vdots$}\noLine \AXC{$\Longrightarrow$} \UIC{}\noLine\UIC{}\noLine\UIC{}\noLine

    \AXC{$\vdots1$}\noLine \UIC{$\blue{B}$}\noLine \UIC{$\vdots3$}\noLine \UIC{$\blue{C}$} \UIC{$\vdots$}\noLine \noLine \TIC{} \end{prooftree}

Propositions as types

<<sec-propositionstypes>> In 1934, Curry observed in cite:curry34 that the type of a function $(\blue{A → B})$ could be read as an implication and that the existence of a function of that type was equivalent to the provability of the proposition. Previously, the Brouwer-Heyting-Kolmogorov interpretation of intuitionistic logic had given a definition of what it meant to be a proof of an intuitionistic formula, where a proof of the implication $(\blue{A → B})$ was a function converting a proof of $\blue{A}$ into a proof of $\blue{B}$. It was not until 1969 that Howard pointed a deep correspondence between the simply-typed λ-calculus and the natural deduction at three levels

  1. propositions are types;
  2. proofs are programs; and
  3. simplification of proofs is evaluation of programs.

In the case of simply typed λ-calculus and natural deduction, the correspondence starts when we describe the following one-to-one relation between types and propositions.

\begin{center}\begin{tabular}{c|c} Types & Propositions
\hline Unit type ($\blue{1}$) & Truth ($\blue{\top}$) \ Product type ($\blue{×}$) & Conjunction ($\blue{∧}$) \ Union type ($\blue{+}$) & Disjunction ($\blue{∨}$) \ Function type ($\blue{→}$) & Implication ($\blue{→}$) \ Empty type ($\blue{0}$) & False ($\blue{\bot}$) \ \end{tabular}\end{center}

Where, in particular, the negation of a proposition $\blue{¬ A}$ is interpreted as the fact that that proposition implies falsehood, $\blue{A → \bot}$; and its corresponding type is a function from the type $\blue{A}$ to the empty type, $\blue{A → 0}$.

Now it is easy to notice that every deduction rule of Section Natural deduction has a correspondence with a typing rule of Section Extending the simply typed λ-calculus. The only distinction between them is the appearance of λ-terms on the first set of rules. As every typing rule results on the construction of a particular kind of λ-term, they can be interpreted as encodings of proofs in the form of derivation trees. That is, terms are proofs of the propositions represented by their types.

Furthermore, under this interpretation, /simplification rules are precisely β-reduction rules/. This makes execution of λ-calculus programs correspond to proof simplification on natural deduction. The Curry-Howard correspondence is then not only a simple bijection between types and propositions, but a deeper isomorphism regarding the way they are constructed, used in derivations, and simplified.

Other type systems

λ-cube

The λ-cube is a taxonomy for Church-style type systems given by Barendregt in cite:barendregt92. It describes eight type systems based on the λ-calculus along three axes, representing three properties of the systems.

  1. Parametric polymorphism: terms that depend on types. This is achieved via universal quantification over types. It allows type variables and binders for them. An example is the following parametric identity function where $Λ$ acts as a $λ$ for types, and $τ$ is a type variable. \[ \mathrm{id} ≡ Λ τ . λ x . x : ∀ τ . τ → τ, \] It can be applied to any particular type $A$ to obtain the specific identity function for that type as \[ \mathrm{id}A ≡ λ x.x : A → A. \]
  2. Type operators: types that depend on types. An example of type operator is $[-]$, which sends each type $A$ to the type $[A]$ of lists of elements of $A$. If we also assume polymorphism, a higher-order function mapping a function argument over a list would have the following type. \[ \mathrm{map} : ∀ τ. ∀ σ. (τ → σ) → [τ] → [σ] \]
  3. Dependent types: types that depend on terms. An example is the type $\mathrm{Vect}(n)$ of vectors of a fixed length $n$, where $n$ is, itself, an element of a natural numbers type $n : \mathbb{N}$. The type of vectors of any length, $\mathrm{Vect}(0) + \mathrm{Vect}(1) + \mathrm{Vect}(2) + …$, is written as \[ ∑n : \mathbb{N} \mathrm{Vect}(n). \] Chapters Locally cartesian closed categories and dependent types and Martin-Löf type theory are devoted to the study of dependent types.

The λ-cube is shown in the following figure. \[\begin{tikzcd}[column sep=small] &&& |[label={above:\lcubett{System F$ω$}}]| \systemfo \ar{rr} && |[label=above:\lcubett{CoC},label=above:\phantom{System Fo}]| \systemcoc &
\phantom{.} && |[label={left:\lcubett{System F}}]| \systemf \ar{ur}\ar{rr} && \systemfp \ar{ur} && \ &&& \systemo \ar{rr}\ar{uu} && |[label=right:\lcubett{wCoC}]| \systemlpo \ar{uu} && \phantom{λ}\phantom{PQW} \ \ar[\lcred]{uu}[\lcred]{\text{\parbox{2cm}{\centering terms depend on types}}} && |[label=below:\lcubett{STLC}]| \stlc \ar{uu}\ar{rr}\ar{ur} && |[label=below:\lcubett{DTLC}]| \systemlp \ar{uu}\ar{ur} && \phantom{.} \ && \ar[\lcred]{rr}[swap,\lcred]{\text{\parbox{2cm}{\centering types depend on terms}}} && \phantom{.} & \ar[\lcred]{ur}[swap,\lcred]{\text{\parbox{2cm}{\centering types depend on types}}} & \end{tikzcd}\]

It presents the following type systems. Some of which are not commonly used, but all of them are strongly normalizing.

  • Simply typed λ-calculus ($\stlc$); as described in Section Typing rules for simply typed λ-calculus.
  • Simply typed λ-calculus with operators ($\systemo$).
  • System F ($\systemf$) and System F-omega ($\systemfo$) add polymorphism to the simply typed λ-calculus and type operators. The Haskell programming language is based on System F-omega with some restrictions.
  • Dependently typed λ-calculus ($\systemlp$); used in the Edinburgh Logical Framework for logic programming (see cite:harper93).
  • Calculus of constructions ($\systemcoc$); where full mathematical theories can be developed (see cite:coquand88). It is used in the Coq Proof Assistant.

The λ-cube is generalized by the theory of pure type systems, described in cite:barendregt92 and cite:geuvers93.

Mikrokosmos

Implementation of λ-expressions

The Haskell programming language

<<sec-haskelllang>>

Haskell as a programming choice

Haskell is the purely functional programming language of our choice to implement our λ-calculus interpreter. Its own design is heavily influenced by the λ-calculus and it is a general-purpose language with a rich ecosystem and plenty of consolidated libraries[fn:hackagelibs] in areas such as parsing, testing or system interaction; matching the requisites of our project. In the following sections, we describe this ecosystem in more detail and justify our choice.

[fn:hackagelibs]: A categorized list of libraries can be found in the central package archive of the Haskell community: https://hackage.haskell.org/packages/

History of Haskell

In the 1980s, many lazy programming languages were independently being written by researchers such as Miranda, Lazy ML, Orwell, Clean or Daisy. All of them were similar in expressive power, but their differences were holding back the efforts to communicate ideas on functional programming, so the Haskell 98 Report was a first standardized reference of a common lazy functional language. A revised version can be read in cite:haskell98. We will use its most standard implementation: the Glasgow Haskell Compiler (GHC); an open source compiler written in Haskell and C. The complete history of Haskell and its design decisions is detailed on cite:hudak07_haskell, but we are interested in the following properties:

Haskell’s properties

Haskell is

  1. strongly and statically typed, meaning that it only compiles well-typed programs and it does not allow implicit type casting; type declarations are then useful in our interpreter to keep a track of what kind of data are we dealing with at each specific function;
  2. lazy, with non-strict semantics, meaning that it will not evaluate a term or the argument of a function until it is needed; this can help to solve the traditional efficiency problems on functional programming (see cite:hughes89);
  3. purely functional; as the evaluation order is demand-driven and not explicitly known, it is not possible to perform ordered input/output actions or any other side-effects that rely on the evaluation order; this helps modularity of our code, testing, and verification;
  4. referentially transparent; as a consequence of its purity, every term on the code can be replaced by its definition without changing the global meaning of the program; this allows equational reasoning with rules that are directly derived from λ-calculus and makes it easier to reason about our functions;
  5. based on System Fω with some restrictions; crucially, it implements System F adding quantification over type operators even if it does not allow abstraction on type operators; the GHC Haskell compiler, however, allows the user to activate extensions that implement dependent types.

Haskell’s syntax

Where most imperative languages use semicolons to separate sequential commands, Haskell has no notion of sequencing, and programs are written in a purely declarative way. A Haskell program essentially consist on a series of definitions (of both types and terms) and type declarations. The following example shows the definition of a binary tree and its preorder.

-- A tree is either empty or a node with two subtrees.
data Tree a = Empty | Node a (Tree a) (Tree a)
-- The preorder function takes a tree and returns a list
preorder :: Tree a -> [a]
preorder Empty            = []
preorder (Node x lft rgt) = preorder lft ++ [x] ++ preorder rgt

We can see on the previous example that function definitions allow pattern matching, that is, data constructors can be used in definitions to decompose values of the type. This increases readability when working with algebraic data types or implementing inductive definitions. Note that the majority of the definitions we discussed in Sections sec-untypedlambda and sec-simplytypedlambda are precisely structurally inductive.

While infix operators are allowed, function application is left-associative in general. Definitions using partial application are allowed, meaning that functions on multiple arguments can use currying and can be passed only one of its arguments to define a new function. For example, a function that squares every number on a list could be written in two ways, as the following example shows. The second one, because of its simplicity, is usually preferred.

squareList :: [Int] -> [Int]
squareList list = map square list
squareList' :: [Int] -> [Int]
squareList' = map square

Type classes, monads

A characteristic piece of Haskell are type classes, which allow defining common interfaces for different types. In the following example, we define Monad as the type class of types with suitably typed return and >>= operators.

class Monad m where
  return :: a   -> m a
  (>>=)  :: m a -> (a -> m b) -> m b

And lists, for example, are monads in this sense.

instance Monad [] where
  return x = [x]               -- returns a one-element list
  xs >>= f = concat (map f xs) -- map and concatenation

Monads are used in I/O, error propagation and stateful computations. Another characteristic syntax feature of Haskell is the do notation, which provides a nicer, cleaner way to concatenate computations with monads that resembles an imperative language. The following example uses the list monad to compute the list of Pythagorean triples.

pythagorean = do
  a <- [1..]               -- let a be any natural
  b <- [1..a]              -- let b be a natural between 1 and a
  c <- [1..b]              -- let c be a natural between 1 and b
  guard (a^2 == b^2 + c^2) -- filter the list
  return (a,b,c)           -- return matching tuples

Note that this list is infinite. As the language is lazy, this does not represent a problem: the list will be evaluated only on demand.

Another common example of an instance of the Monad typeclass is the Maybe monad used to deal with error propagation. A Maybe a type can consist of a term of type a, written as Just a; or as a Nothing constant, signalling an error. The monad is then defined as

instance Monad Maybe where
  return x = Just x
  xs >>= f = case xs of Nothing -> Nothing | Just a -> Just (f a)

and can be used as in the following example to use exception-like error handling in a pure declarative language.

roots :: (Float,Float,Float) -> Maybe Int
roots (a,b,c) = do
  -- Some errors can occur during this computation
  discriminant <- sqroot (b*b - 4*c*a)         -- roots of negative numbers?
  root1 <- safeDiv ((-b) + discriminant) (2*a) -- division by zero?
  root2 <- safeDiv ((-b) - discriminant) (2*a)
  -- The monad ensures that we return a number only if no error has been raised
  return (root1,root2)

A more detailed treatment of monads from the perspective of category theory is presented in Section sec-monads.

De Bruijn indexes

<<sec-debruijnindexes>> Nicolaas Govert De Bruijn proposed (see cite:debruijn81) a way of defining λ-terms modulo α-conversion based on indices. The main goal of De Bruijn indices is to remove all variables from binders and replace every variable on the body of an expression with a number, called index, representing the number of λ-abstractions in scope between the occurrence and its binder. Consider the following example where we draw arrows between each term and its intermediate λ-abstractions: the λ-term \[ \red{λ\tikzmark{2} y}.\ \tikzmark{3}y\ (\blue{λ\tikzmark{1} z}.\ \tikzmark{0}y\ \tikzmark{4}z) \begin{tikzpicture}[remember picture, overlay, bend right=45, -latex, redPRL] \draw ([yshift=2ex]pic cs:0) to ([yshift=2ex]pic cs:1); \draw ([yshift=2ex]pic cs:1) to ([yshift=2ex]pic cs:2); \draw ([yshift=2ex]pic cs:3) to ([yshift=2ex]pic cs:2); \end{tikzpicture} \begin{tikzpicture}[remember picture, overlay, bend left=45, -latex, bluePRL] \draw ([yshift=-1ex]pic cs:4) to ([yshift=-1ex]pic cs:1); \end{tikzpicture} \] can be written with De Bruijn indices as $\red{λ}\ (1\ \blue{λ}\ (2\ 1))$.

De Bruijn also proposed a notation for the λ-calculus changing the order of binders and λ-applications. A review on the syntax of this notation, its advantages and De Bruijn indexes, can be found in cite:kamareddine01. In this section, we are going to describe De Bruijn indexes while preserving the usual notation of λ-terms; that is, De Bruijn indexes and De Bruijn notation are two different concepts and we are going to use only the former one for clarity of exposition.

Our internal definition closely matches the formal one. The names of the constructors are Var, Lambda and App, representing variables, abstractions and applications, respectively.

-- | A lambda expression using DeBruijn indexes.
data Exp = Var Integer -- ^ integer indexing the variable.
         | Lambda Exp  -- ^ lambda abstraction
         | App Exp Exp -- ^ function application
         deriving (Eq, Ord)

This notation avoids the need for the Barendregt’s variable convention and the α-reductions. It will be useful to implement λ-calculus without having to worry about the specific names of variables.

Substitution

<<sec-substitution>> We now implement the substitution operation described in Section sec-freeboundvars, as it will be needed for β-reducing terms on De Bruijn indices. In order to define the substitution of the n-th variable by a λ-term $P$ on a given term, we must

  • find all the occurrences of the variable. At each level of scope we are looking for the successor of the number we were looking for before;
  • decrease the higher variables to reflect the disappearance of a lambda;
  • replace the occurrences of the variables by the new term, taking into account that free variables must be increased to avoid them getting captured by the outermost lambda terms.

In our code, we can apply the function subs to any expression. When it is applied to a λ-abstraction, the index and the free variables of the replaced term are increased with an auxiliary function called incrementFreeVars; whenever it is applied to a variable, the previous cases are taken into consideration.

-- | Substitutes an index for a lambda expression
subs :: Integer -> Exp -> Exp -> Exp
subs n p (Lambda e) = Lambda (subs (n+1) (incrementFreeVars 0 p) e)
subs n p (App f g)  = App (subs n p f) (subs n p g)
subs n p (Var m)
  | n == m    = p         -- The lambda is replaced directly  
  | n <  m    = Var (m-1) -- A more exterior lambda decreases a number
  | otherwise = Var m     -- An unrelated variable remains untouched

Now β-reduction can be defined using this subs function.

betared :: Exp -> Exp
betared (App (Lambda e) x) = substitute 1 x e
betared e = e

De Bruijn-terms and λ-terms

The internal language of the interpreter uses De Bruijn expressions, while the user interacts with it using lambda expressions with alphanumeric variables. Our definition of a lambda expression with variables will be used in parsing and output formatting.

data NamedLambda = LambdaVariable String                    
                 | LambdaAbstraction String NamedLambda     
                 | LambdaApplication NamedLambda NamedLambda

Lambda to deBruijn

The translation from a natural lambda expression to De Bruijn notation is done using a dictionary which keeps track of bounded variables.

tobruijn :: Map.Map String Integer -- ^ names of the variables used
         -> Context                -- ^ names already binded on the scope
         -> NamedLambda            -- ^ initial expression
         -> Exp
-- Every lambda abstraction is inserted in the variable dictionary,
-- and every number in the dictionary increases to reflect we are entering
-- a deeper context.
tobruijn d context (LambdaAbstraction c e) = 
     Lambda $ tobruijn newdict context e
        where newdict = Map.insert c 1 (Map.map succ d)

-- Translation distributes over applications.
tobruijn d context (LambdaApplication f g) = 
     App (tobruijn d context f) (tobruijn d context g)

-- We look for every variable on the local dictionary and the current scope.
tobruijn d context (LambdaVariable c) =
  case Map.lookup c d of
    Just n  -> Var n
    Nothing -> fromMaybe (Var 0) (MultiBimap.lookupR c context)

deBruijn to Lambda

The translation from a de Bruijn expression to a natural one is done considering an infinite list of possible variable names and keeping a list of currently-on-scope variables to name the indices.

-- | An infinite list of all possible variable names 
-- in lexicographical order.
variableNames :: [String]
variableNames = concatMap (`replicateM` ['a'..'z']) [1..]

-- | A function translating a deBruijn expression into a 
-- natural lambda expression.
nameIndexes :: [String] -> [String] -> Exp -> NamedLambda
nameIndexes _    _   (Var 0) = LambdaVariable "undefined"
nameIndexes used _   (Var n) = 
  LambdaVariable (used !! pred (fromInteger n))
nameIndexes used new (Lambda e) = 
  LambdaAbstraction (head new) (nameIndexes (head new:used) (tail new) e)
nameIndexes used new (App f g) = 
  LambdaApplication (nameIndexes used new f) (nameIndexes used new g)

Evaluation

<<section-evaluation>> As we proved on Corollary cor-leftmosttheorem, the leftmost reduction strategy will find the normal form of any given term provided that it exists. Consequently, we will implement reduction in our interpreter using a function that simply applies the leftmost possible reductions at each step. As a side benefit, this will allow us to easily show how the interpreter performs step-by-step evaluations to the final user (see Section sec-verbose).

-- | Simplifies the expression recursively.
-- Applies only one parallel beta reduction at each step.
simplify :: Exp -> Exp
simplify (Lambda e)           = Lambda (simplify e)
simplify (App (Lambda f) x)   = betared (App (Lambda f) x)
simplify (App (Var e) x)      = App (Var e) (simplify x)
simplify (App a b)            = App (simplify a) (simplify b)
simplify (Var e)              = Var e

-- | Applies repeated simplification to the expression until it stabilizes and
-- returns all the intermediate results.
simplifySteps :: Exp -> [Exp]
simplifySteps e
  | e == s    = [e]
  | otherwise = e : simplifySteps s
  where s = simplify e

From the code we can see that the evaluation finishes whenever the expression stabilizes. This can happen in two different cases

  • there are no more possible β-reductions, and the algorithm stops; or
  • β-reductions do not change the expression. The computation would lead to an infinite loop, so it is immediately stopped. An common example of this is the λ-term $(λ x.x x)(λ x.x x)$.

Principal type inference

<<sec-typeinference>> The interpreter implements the unification and type inference algorithms described in Lemma lemma-unification and Theorem thm-typeinfer. Their recursive nature makes them very easy to implement directly on Haskell.

Type templates and substitutions

We implement a simply-typed lambda calculus with Curry-style typing (see Section sec-currystyle) and type templates. Our type system has a unit type; a void type; product types; union types; and function types.

-- | A type template is a free type variable or an arrow between two
-- types; that is, the function type.
data Type = Tvar Variable
          | Arrow Type Type
          | Times Type Type
          | Union Type Type
          | Unitty
          | Bottom
          deriving (Eq)

We will work with substitutions on type templates. They can be directly defined as functions from types to types. A basic substitution that inserts a given type on the place of a variable will be our building block for more complex ones.

type Substitution = Type -> Type

-- | A basic substution. It changes a variable for a type
subs :: Variable -> Type -> Substitution
subs x typ (Tvar y)
  | x == y    = typ
  | otherwise = Tvar y
subs x typ (Arrow a b) = Arrow (subs x typ a) (subs x typ b)
subs x typ (Times a b) = Times (subs x typ a) (subs x typ b)
subs x typ (Union a b) = Union (subs x typ a) (subs x typ b)
subs _ _ Unitty = Unitty
subs _ _ Bottom = Bottom

Unification

Unification will be implemented making extensive use of the Maybe monad. If the unification fails, it will return an error value, and the error will be propagated to the whole computation. The algorithm is exactly the same that was defined in Lemma lemma-unification.

-- | Unifies two types with their most general unifier. Returns the substitution
-- that transforms any of the types into the unifier.
unify :: Type -> Type -> Maybe Substitution
unify (Tvar x) (Tvar y)
  | x == y    = Just id
  | otherwise = Just (subs x (Tvar y))
unify (Tvar x) b
  | occurs x b = Nothing
  | otherwise  = Just (subs x b)
unify a (Tvar y)
  | occurs y a = Nothing
  | otherwise  = Just (subs y a)
unify (Arrow a b) (Arrow c d) = unifypair (a,b) (c,d)
unify (Times a b) (Times c d) = unifypair (a,b) (c,d)
unify (Union a b) (Union c d) = unifypair (a,b) (c,d)
unify Unitty Unitty = Just id
unify Bottom Bottom = Just id
unify _ _ = Nothing

-- | Unifies a pair of types
unifypair :: (Type,Type) -> (Type,Type) -> Maybe Substitution
unifypair (a,b) (c,d) = do
  p <- unify b d
  q <- unify (p a) (p c)
  return (q . p)

Type inference

The type inference algorithm from Theorem thm-typeinfer is more involved. It takes a list of fresh variables, a type context, a lambda expression and a constraint on the type, expressed as a type template. It outputs a substitution. As an example, the following code shows the type inference algorithm for function types.

-- | Type inference algorithm. Infers a type from a given context and expression
-- with a set of constraints represented by a unifier type. The result type must
-- be unifiable with this given type.
typeinfer :: [Variable] -- ^ List of fresh variables
          -> Context    -- ^ Type context
          -> Exp        -- ^ Lambda expression whose type has to be inferred
          -> Type       -- ^ Constraint
          -> Maybe Substitution

typeinfer (x:vars) ctx (App p q) b = do -- Writing inside the Maybe monad.
  sigma <- typeinfer (evens vars) ctx                  p (Arrow (Tvar x) b)
  tau   <- typeinfer (odds  vars) (applyctx sigma ctx) q (sigma (Tvar x))
  return (tau . sigma)

The final form of the type inference algorithm will use a normalization algorithm shortening the type names and will apply the type inference to the empty type context.

Gentzen deduction trees

A generalized version of the type inference algorithm is used to generate derivation trees from terms, as it was described in Section sec-propositionstypes. In order to draw these diagrams in Unicode characters, a data type for character blocks has been defined. A monoidal structure is defined over them; blocks can be joined vertically and horizontally; and every deduction step can be drawn independently.

newtype Block = Block { getBlock :: [String] }
  deriving (Eq, Ord)

instance Monoid Block where
  mappend = joinBlocks -- monoid operation, joins blocks vertically
  mempty  = Block [[]] -- neutral element

-- Type signatures
joinBlocks :: Block -> Block -> Block
stackBlocks :: String -> Block -> Block -> Block
textBlock :: String -> Block
deductionBlock :: Block -> String -> [Block] -> Block
box :: Block -> Block

User interaction

Monadic parser combinators

<<sec-monadicparsers>> The common approach to building parsers in functional programming is to model parsers as functions. Higher-order functions on parsers act as combinators, which are used to implement complex parsers in a modular way from a set of primitive ones. In this setting, parsers exhibit a monad algebraic structure, which can be used to simplify the combination of parsers. A technical report on monadic parser combinators can be found on cite:hutton96.

The use of monads for parsing was discussed for the first time in cite:Wadler85, and later in cite:Wadler90 and cite:hutton98. The parser type is defined as a function taking an input String and returning a list of pairs, representing a successful parse each. The first component of the pair is the parsed value and the second component is the remaining input. The Haskell code for this definition is the following, where the monadic structure is defined by >>= and return.

-- A parser takes a string an returns a list of possible parsings with
-- their remaining string.
newtype Parser a = Parser (String -> [(a,String)])
parse :: Parser a -> String -> [(a,String)]
parse (Parser p) = p
-- A parser can be composed monadically, the composed parser (p >>= q)
-- applies q to every possible parsing of p. A trivial one is defined.
instance Monad Parser where
  return x = Parser (\s -> [(x,s)]) -- Trivial parser, directly returns x.
  p >>= q  = Parser (\s -> concat [parse (q x) s' | (x,s') <- parse p s ])

Given a value, the return function creates a parser that consumes no input and simply returns the given value. The >>= function acts as a sequencing operator for parsers; it takes two parsers and applies the second one over the remaining inputs of the first one, using the parsed values on the first parsing as arguments.

An example of primitive parser is the item parser, which consumes a character from a non-empty string. It is written in Haskell code using pattern matching on the string as follows.

item :: Parser Char
item = Parser (\s -> case s of "" -> []; (c:s') -> [(c,s')])

An example of parser combinator is the many function, which creates a parser that allows one or more applications of the given parser. In the following example many item would be a parser consuming all characters from the input string.

many :: Paser a -> Parser [a]
many p = do
  a  <- p
  as <- many p
  return (a:as)

Parsec is a monadic parser combinator Haskell library described in cite:leijen2001. We have chosen to use it due to its simplicity and extensive documentation. As we expect to use it to parse user live input, which will tend to be short, performance is not a critical concern. A high-performace library supporting incremental parsing, such as Attoparsec cite:attoparsec, would be suitable otherwise.

Verbose mode

<<sec-verbose>> As we mentioned previously, the evaluation of lambda terms can be analyzed step-by-step. The interpreter allows us to see the complete evaluation when the verbose mode is activated. To activate it, we can execute :verbose on in the interpreter. The difference can be seen on the following example, which shows the execution of the expression $1+2$, first without intermediate results, and later, showing every intermediate step.

mikro> plus 1 2
λa.λb.(a (a (a b))) ⇒ 3

mikro> :verbose on
verbose: on
mikro> plus 1 2
((plus 1) 2)
((λλλλ((4 2) ((3 2) 1)) λλ(2 1)) λλ(2 (2 1)))
(λλλ((λλ(2 1) 2) ((3 2) 1)) λλ(2 (2 1)))
λλ((λλ(2 1) 2) ((λλ(2 (2 1)) 2) 1))
λλ(λ(3 1) (λ(3 (3 1)) 1))
λλ(2 (λ(3 (3 1)) 1))
λλ(2 (2 (2 1)))
λa.λb.(a (a (a b))) ⇒ 3

The interpreter output can be colored to show specifically where it is performing reductions. It is activated by default, but can be deactivated by executing :color off. The following code implements verbose mode in both cases.

-- | Shows an expression, coloring the next reduction if necessary
showReduction :: Exp -> String
showReduction (Lambda e) = "λ" ++ showReduction e
showReduction (App (Lambda f) x) = betaColor (App (Lambda f) x)
showReduction (Var e) = show e
showReduction (App rs x) = "("++showReduction rs++" "++showReduction x++")"
showReduction e = show e

SKI mode

Every λ-term can be written in terms of SKI combinators. SKI combinator expressions can be defined as a binary tree having variables, S, K, and I as possible leafs.

data Ski = S | K | I | Comb Ski Ski | Cte String

The SKI-abstraction and bracket abstraction algorithms are implemented on Mikrokosmos, and they can be used by activating the ski mode with :ski on. When this mode is activated, every result is written in terms of SKI combinators.

mikro> 2
λa.λb.(a (a b)) ⇒ S(S(KS)K)I ⇒ 2
mikro> and
λa.λb.((a b) a) ⇒ SSK ⇒ and

The code implementing these algorithms follows directly from the theoretical version in cite:Hindley08.

-- | Bracket abstraction of a SKI term, as defined in Hindley-Seldin
-- (2.18).
bracketabs :: String -> Ski -> Ski
bracketabs x (Cte y) = if x == y then I else Comb K (Cte y)
bracketabs x (Comb u (Cte y))
  | freein x u && x == y = u
  | freein x u           = Comb K (Comb u (Cte y))
  | otherwise            = Comb (Comb S (bracketabs x u)) (bracketabs x (Cte y))
bracketabs x (Comb u v)
  | freein x (Comb u v)  = Comb K (Comb u v)
  | otherwise            = Comb (Comb S (bracketabs x u)) (bracketabs x v)
bracketabs _ a           = Comb K a

-- | SKI abstraction of a named lambda term. From a lambda expression
-- creates a SKI equivalent expression. The following algorithm is a
-- version of the algorithm (9.10) on the Hindley-Seldin book.
skiabs :: NamedLambda -> Ski
skiabs (LambdaVariable x)      = Cte x
skiabs (LambdaApplication m n) = Comb (skiabs m) (skiabs n)
skiabs (LambdaAbstraction x m) = bracketabs x (skiabs m)

Usage

Installation

<<section-installation>> The complete Mikrokosmos suite is divided in multiple parts:

  1. the Mikrokosmos interpreter, written in Haskell;
  2. the Jupyter kernel, written in Python;
  3. the CodeMirror Lexer, written in Javascript;
  4. the Mikrokosmos libraries, written in the Mikrokosmos language;
  5. the Mikrokosmos-js compilation, which can be used in web browsers.

These parts will be detailed on the following sections. A system that already satisfies all dependencies (Stack, Pip and Jupyter), can install Mikrokosmos using the following script, which is detailed on this section

stack install mikrokosmos                             # Mikrokosmos interpreter
sudo pip install imikrokosmos                  # Jupyter kernel for Mikrokosmos
git clone https://github.com/mroman42/mikrokosmos-lib.git ~/.mikrokosmos # Libs

Mikrokosmos interpreter

The Mikrokosmos interpreter is listed in the central Haskell package archive[fn:hackage]. The packaging of Mikrokosmos has been done using the cabal tool; and the configuration of the package can be read in the file mikrokosmos.cabal of the source code. As a result, Mikrokosmos can be installed using the Haskell package managers cabal and stack.

cabal install mikrokosmos          # Installation with cabal
stack install mikrokosmos          # Installation with stack

Mikrokosmos Jupyter kernel

The Mikrokosmos Jupyter kernel is listed in the central Python package archive[fn:pipmikro]. Jupyter is a dependency of this kernel, which only can be used in conjunction with it. It can be installed with the pip package manager.

sudo pip install imikrokosmos      # Installation with pip

The installation can be checked by listing the available Jupyter kernels.

jupyter kernelspec list            # Checks installation

Mikrokosmos libraries

The Mikrokosmos libraries can be downloaded directly from their GitHub repository[fn:mikrokosmoslibgit]. They have to be placed under ~/.mikrokosmos if we want them to be locally available or under /usr/lib/mikrokosmos if we want them to be globally available.

git clone https://github.com/mroman42/mikrokosmos-lib.git ~/.mikrokosmos

Complete script

The following script installs the complete Mikrokosmos suite on a fresh system. It has been tested under Ubuntu 16.04.3 LTS (Xenial Xerus).

# 1. Installs Stack, the Haskell package manager
wget -qO- https://get.haskellstack.org | sh
STACK=$(which stack)

# 2. Installs the ncurses library, used by the console interface
sudo apt install libncurses5-dev libncursesw5-dev

# 3. Installs the Mikrokosmos interpreter using Stack
$STACK setup
$STACK install mikrokosmos

# 4. Installs the Mikrokosmos standard libraries
sudo apt install git
git clone https://github.com/mroman42/mikrokosmos-lib.git ~/.mikrokosmos

# 5. Installs the IMikrokosmos kernel for Jupyter
sudo apt install python3-pip
sudo -H pip install --upgrade pip
sudo -H pip install jupyter
sudo -H pip install imikrokosmos

[fn:pipmikro]: The Jupyter-Mikrokosmos package can be found in https://pypi.org/project/imikrokosmos/. [fn:hackage]: Hackage can be accessed in http://hackage.haskell.org/ and the Mikrokosmos package can be found in https://hackage.haskell.org/package/mikrokosmos [fn:mikrokosmoslibgit]: The repository can be accessed in: https://github.com/mroman42/mikrokosmos-lib.git

Mikrokosmos interpreter

Once installed, the Mikrokosmos λ interpreter can be opened from the terminal with the mikrokosmos command. It will enter a read-eval-print loop where λ-expressions and interpreter commands can be evaluated.

$> mikrokosmos
Welcome to the Mikrokosmos Lambda Interpreter!
Version 0.7.0. GNU General Public License Version 3.
mikro> _

The interpreter evaluates every line as a lambda expression. Examples on the use of the interpreter can be read on the following sections. Apart from the evaluation of expressions, the interpreter accepts the following commands

  • :quit and :restart, stop the interpreter;
  • :verbose activates verbose mode;
  • :ski activates SKI mode;
  • :types changes between untyped and simply typed λ-calculus;
  • :color deactivates colored output;
  • :load loads a library.

Figure mikrosession is an example session on the Mikrokosmos interpreter.

./images/mikrosession.png

Jupyter kernel

<<section-jupyterkernel>> The Jupyter Project cite:jupyter is an open source project providing support for interactive scientific computing. Specifically, the Jupyter Notebook provides a web application for creating interactive documents with live code and visualizations.

We have developed a Mikrokosmos kernel for the Jupyter Notebook, allowing the user to write and execute arbitrary Mikrokosmos code on this web application. An example session can be seen on Figure jupytersession.

./images/jupytersession.png

The implementation is based on the pexpect library for Python. It allows direct interaction with any REPL and collects its results. Specifically, the following Python lines represent the central idea of this implementation.

# Initialization
mikro = pexpect.spawn('mikrokosmos')
mikro.expect('mikro>')

# Interpreter interaction
# Multiple-line support
output = ""
for line in code.split('\n'):
    # Send code to mikrokosmos
    self.mikro.sendline(line)
    self.mikro.expect('mikro> ')

    # Receive and filter output from mikrokosmos
    partialoutput = self.mikro.before
    partialoutput = partialoutput.decode('utf8')
    output = output + partialoutput

A pip installable package has been created following the Python Packaging Authority guidelines[fn:pypaguide]. This allows the kernel to be installed directly using the pip python package manager.

sudo -H pip install imikrokosmos

[fn:pypaguide]: The PyPA packaging user guide can be found in its official page: https://packaging.python.org/

CodeMirror lexer

CodeMirror [fn:codemirror] is a text editor for the browser implemented in Javascript. It is used internally by the Jupyter Notebook.

A CodeMirror lexer for Mikrokosmos has been written. It uses Javascript regular expressions and signals the occurrence of any kind of operator to CodeMirror. It enables syntax highlighting for Mikrokosmos code on Jupyter Notebooks. It comes bundled with the kernel specification and no additional installation is required.

	CodeMirror.defineSimpleMode("mikrokosmos", {
	    start: [
	    // Comments
	    {regex: /\#.*/,
           token: "comment"},
	    // Interpreter
           {regex: /\:load|\:verbose|\:ski|\:restart|\:types|\:color/,
           token: "atom"},
	    // Binding
	    {regex: /(.*?)(\s*)(=)(\s*)(.*?)$/,
	    token: ["def",null,"operator",null,"variable"]},
	    // Operators
	    {regex: /[=!]+/,
           token: "operator"},
	    ],
	    meta: {
		dontIndentStates: ["comment"],
		lineComment: "#"
	    }
	}

[fn:codemirror]: Documentation for CodeMirror can be found in its official page: https://codemirror.net/

JupyterHub

<<section-jupyterhub>> JupyterHub manages multiple instances of independent single-user Jupyter notebooks. It has been used to serve Mikrokosmos notebooks and tutorials to students. In order to install Mikrokosmos on a server and use it as root user, we perform the following steps.

  • Cloning the libraries into /usr/lib/mikrokosmos. They should be available system-wide.
  • Installing the Mikrokosmos interpreter into /usr/local/bin. In this case, we can choose not to install Mikrokosmos from source, but simply copy the binaries and check the availability of the ncurses library.
  • Installing the Mikrokosmos Jupyter kernel as usual.

A JupyterHub server was made available at iemath1.ugr.es. Our server used a SSL certificate and OAuth authentication via GitHub. Mikrokosmos tutorials and exercises were installed for every student.

Calling Mikrokosmos from Javascript

<<section-javascript>> The GHCjs[fn:ghcjs] compiler allows transpiling from Haskell to Javascript. Its foreign function interface allows a Haskell function to be passed as a continuation to a Javascript function.

A particular version of the Main.hs module of Mikrokosmos was written in order to provide a mikrokosmos function, callable from Javascript. This version includes the standard libraries automatically and reads blocks of text as independent Mikrokosmos commands. The relevant use of the foreign function interface is shown in the following code, which provides mikrokosmos as a Javascript function once the code is transpiled.

foreign import javascript unsafe "mikrokosmos = $1"
    set_mikrokosmos :: Callback a -> IO ()

In particular, the following is an example of how to call Mikrokosmos from Javascript.

button.onclick = function () {
   editor.save();
   outputcode.getDoc().setValue(mikrokosmos(inputarea.value).mkroutput);
   textAreaAdjust(outputarea);
}

A small script has been written in Javascript to help with the task of embedding Mikrokosmos into a web page. It and can be included directly from the following direction, using GitHub as a CDN.

The script will convert any HTML script tag written as follows into a CodeMirror pad where Mikrokosmos can be executed.

<div class="mikrojs-console">
<script type="text/mikrokosmos">
(λx.x)
... your code
</script>
</div>

The Mikrokosmos tutorials are an example of this feature and can be seen on Figure mikrokosmosjstutorial. They can be accessed from the following direction.

Image

./images/mikrokosmosjs.png

[fn:ghcjs]: The GHCjs documentation is available on its web page https://github.com/ghcjs/ghcjs

Programming environment

Cabal, Stack and Haddock

The Mikrokosmos documentation as a Haskell library is included in its own code. It uses Haddock, a tool that generates documentation from annotated Haskell code; it is the de facto standard for Haskell software.

Dependencies and packaging details for Mikrokosmos are specified in a file distributed with the source code called mikrokosmos.cabal. It is used by the package managers stack and cabal to provide the necessary libraries even if they are not available system-wide. The stack tool is also used to package the software, which is uploaded to *Hackage*.

Testing

<<section-testing>> Tasty is the Haskell testing framework of our choice for this project. It allows the user to create a comprehensive test suite combining multiple types of tests. The Mikrokosmos code is tested using the following techniques

  • unit tests, in which individual core functions are tested independently of the rest of the application;
  • property-based testing, in which multiple test cases are created automatically in order to verify that a specified property always holds; this has been useful to test our implementation of several algorithms on the lambda calculus;
  • golden tests, a special case of unit tests in which the expected results of an IO action, as described on a file, are checked to match the actual ones; they have been used to check correctness of the Mikrokosmos language.

We are using the HUnit library for unit tests. It tests particular cases of type inference, unification and parsing. The following is an example of unit test, as found in tests.hs. It checks that the type inference of the identity term is correct.

-- Checks that the type of λx.x is exactly A → A
testCase "Identity type inference" $
  typeinference (Lambda (Var 1)) @?= Just (Arrow (Tvar 0) (Tvar 0))

We are using the QuickCheck library for property-based tests. It tests transformation properties of lambda expressions. In the following example, it tests that any De Bruijn expression keeps its meaning when translated into a λ-term.

-- Tests if translation preserves meaning
QC.testProperty "Expression -> named -> expression" $
  \expr -> toBruijn emptyContext (nameExp expr) == expr

We are using the tasty-golden package for golden tests. Mikrokosmos can be passed a file as an argument to interpret it and show only the results. This feature is used to create a golden test in which the interpreter is asked to provide the correct interpretation of a given file. This file is called testing.mkr, and contains library definitions and multiple tests. Its expected output is testing.golden. For example, the following Mikrokosmos code can be found on the testing file.

:types on
caseof (inr 3) (plus 2) (mult 2)

While the expected output is the following.

-- types: on
-- λa.λb.(a (a (a (a (a (a b)))))) ⇒ 6 :: (A → A) → A → A

Version control and continuous integration

<<sec-versioncontrolci>> Mikrokosmos uses git as its version control system and the code, which is licensed under GPLv3, can be publicly accessed on the following GitHub repository:

Development takes place on the development git branch and permanent changes are released into the master branch. Some more minor repositories have been used in the development; they directly depend on the main one.

The code uses the Travis CI continuous integration system to run tests and check that the software builds correctly after each change and in a reproducible way on a fresh Linux installation provided by the service.

Programming in untyped λ-calculus

<<sec-programming-untyped>>

This section explains how to use untyped λ-calculus to encode data structures such as booleans, linked lists, natural numbers or binary trees. All of this is done in pure λ-calculus, avoiding the addition of new syntax or axioms.

This presentation follows the Mikrokosmos tutorial on λ-calculus, which aims to teach how it is possible to program using untyped λ-calculus without discussing more advanced technical topics such as those we addressed on Chapter sec-untypedlambda. It also follows the exposition on cite:selinger13 of the usual Church encodings.

All the code on this section is valid Mikrokosmos code.

Basic syntax

In the interpreter, λ-abstractions are written with the symbol \, representing a λ. This is a convention used on some functional languages such as Haskell or Agda. Any alphanumeric string can be a variable and can be defined to represent a particular λ-term using the === operator.

As a first example, we define the identity function (id), function composition (compose) and a constant function on two arguments which always returns the first one untouched (const).

id = \x.x
compose = \f.\g.\x.f (g x)
const = \x.\y.x

Evaluation of terms will be presented as comments to the code, as in the following example.

compose id id                                                -- [1]: λa.a ⇒ id

It is important to notice that multiple argument functions are defined as higher one-argument functions that return different functions as arguments. These intermediate functions are also valid λ-terms. For example,

discard = const id

is a function that discards one argument and returns the identity, id. This way of defining multiple argument functions is called the currying of a function in honor to the American logician Haskell Curry in cite:haskell58. It is a particular instance of a deeper fact we will detail on Chapter sec-cartesianclosedlambdacalculus: exponentials are defined by the adjunction $hom(A × B, C) ≅ hom(A, hom(B,C))$.

A technique on inductive data encoding

<<technique-on-inductive>> We will implicitly use a technique on the majority of our data encodings that allows us to write an encoding for any algebraically inductive generated data. This technique is used without explicit comment on cite:selinger13 and represents the basis of what is called the Church encoding of data in λ-calculus.

We start considering the usual inductive representation of a data type with constructors, as we do when representing a syntax with a Backus-Naus form. For example, the naturals can be written as $\mathsf{Nat} ::= \mathsf{Zero} \mid \mathsf{Succ}\ \mathsf{Nat}$; and, in general a datatype $D$ with constructors $C_1,C_2,C_3,…$ is written as $D ::= C_1 \mid C_2 \mid C_3 \mid …$.

It is not possible to directly encode constructors on λ-calculus. Even if we were able to declare constants, they would have no computational content; the data structure would not be reduced under any λ-term, and we would need at least the ability to pattern-match on the constructors to define functions on them. Our λ-calculus would need to be extended with additional syntax for every new data structure.

Our technique, instead, is to define a data term as a function on multiple arguments representing the missing constructors. For instance, the number $2$, which would be written as $\mathsf{Succ}(\mathsf{Succ}(\mathsf{Zero}))$ using the Backus-Naus form, is encoded as $2 = λ s.\ λ z.\ s (s (z))$. In general, any instance of the data structure $D$ is encoded as a λ-expression depending on all of its constructors, as in $λ c1.\ λ c2.\ λ c3.\ …\ λ cn. (\textit{term})$.

This acts as the definition of an initial algebra over the constructors (see Section sec-algebras) and lets us to compute over instances of that algebra by instantiating it on particular cases. We will develop explicit examples of this technique on the following sections.

Booleans

Booleans can be defined as the data generated by a pair of zero-ary constructors, each one representing a truth value, as in $\mathsf{Bool} ::= \mathsf{True} \mid \mathsf{False}$. Consequently, the Church encoding of booleans takes these constructors as arguments and defines the following two elements.

true  = \t.\f.t
false = \t.\f.f

If-else interpretation

Note that true and the const function we defined earlier are exactly the same term up to α-conversion. The same thing happens with false and alwaysid. The absence of types prevents any effort to discriminate between these two uses of the same λ-term. Another side-effect of this definition is that our true and false terms can be interpreted as binary functions choosing between two arguments, that is, $\mathtt{true}(a,b) = a$, and $\mathtt{false}(a,b) = b$.

We can test this interpretation on the interpreter to get the expected results.

true id const                                                   --- [1]: id
false id const                                                  --- [2]: const

And this inspires the definition of an ifelse combinator as the identity, when applied to boolean values.

ifelse = \b.b
(ifelse true) id const                                          --- [1]: id
(ifelse false) id const                                         --- [2]: const

Logic gates

The usual logic gates can be defined using this interpretation of the booleans.

and = \p.\q.p q p
or = \p.\q.p p q
not = \b.b false true
xor = \a.\b.a (not b) b
implies = \p.\q.or (not p) q

xor true true                                                   --- [1]: false
and true true                                                   --- [2]: true

Natural numbers

Peano natural numbers

Our definition of natural numbers is inspired by the Peano natural numbers. We use two constructors

  • zero is a natural number, written as Z;
  • the successor of a natural number is a natural number, written as S;

and the Backus-Naus form we defined when discussing how to encode inductive data in Section technique-on-inductive.

0    = \s.\z.z
succ = \n.\s.\z.s (n s z)

This definition of 0 is trivial: given a successor function and a zero, return zero. The successor function seems more complex, but it uses the same underlying idea: given a number, a successor and a zero, apply the successor to the interpretation of that number using the same successor and zero.

We can then name some natural numbers as follows, even if we can not define an infinite number of terms as we might wish.

1 = succ 0
2 = succ 1
3 = succ 2
4 = succ 3
5 = succ 4
6 = succ 5
...

Interpretation as higher-order functions

The interpretation a natural number $n$ as a higher order function is a function taking an argument f and applying them $n$ times over the second argument. See the following examples.

5 not true                                                      --- [1]: false
4 not true                                                      --- [2]: true 
double = \n.\s.\z.n (compose s s) z
double 3                                                        --- [3]: 6

Addition and multiplication

Addition $n+m$ applies the successor $m$ times to $n$; and multiplication $nm$ applies the $n\text{-fold}$ application of the successor $m$ times to $0$.

plus = \m.\n.\s.\z.m s (n s z)
mult = \m.\n.\s.\z.m (n s) z
plus 2 1                                                        --- [1]: 3
mult 2 4                                                        --- [2]: 8

The predecessor function and predicates on numbers

Predecessor

The predecessor function is much more complex than the previous ones. As we can see, it is not trivial how we could compute the predecessor using the limited form of induction that Church numerals allow.

Stephen Kleene, one of the students of Alonzo Church only discovered how to write the predecessor function after thinking about it for a long time (and he only discovered it while a long visit at the dentist’s, which is the reason why this definition is often called the wisdom tooth trick, see cite:crossley75). We will use a slightly different version of that definition, as we consider it to be simpler to understand.

We will start defining a reverse composition operator, called rcomp; and we will study what happens when it is composed to itself; that is, the operator we define in the following code

rcomp = \f.\g.\h.h (g f)
\f.3 (inc f)                         --- [1]: λa.λb.λc.c (a (a (b a)))
\f.4 (inc f)                         --- [2]: λa.λb.λc.c (a (a (a (b a))))
\f.5 (inc f)                         --- [3]: λa.λb.λc.c (a (a (a (a (b a)))))

allows us to use the b argument to discard the first instance of the a argument and return the same number without the last constructor. Thus, our definition of pred is the following.

pred = \n.\s.\z.(n (inc s) (\x.z) (\x.x))

Predicates

From the definition of pred, some predicates on numbers can be defined. The first predicate will be a function distinguishing a successor from a zero. It will be user later to build more complex ones. It is built by applying a const false function n times to a true constant. Only if it is applied 0 times, it will return a true value.

iszero = \n.(n (const false) true)
iszero 0                                                        --- [1]: true
iszero 2                                                        --- [2]: false

From this predicate, we can derive predicates on equality and ordering.

leq = \m.\n.(iszero (minus m n))
eq  = \m.\n.(and (leq m n) (leq n m))

Lists and trees

We would need two constructors to represent a list: a $\mathsf{nil}$ signaling the end of the list and a $\mathsf{cons}$, appending an element to the head of the list. For instance, a list would be $\mathsf{cons}\ 1\ (\mathsf{cons}\ 2\ (\mathsf{cons}\ 3\ \mathsf{nil}))$. Our definition takes those two constructors into account.

nil  = \c.\n.n
cons = \h.\t.\c.\n.(c h (t c n))

The interpretation of a list as a higher-order function is its $\mathsf{fold}$ function, a function taking a binary operation and an initial element and applying the operation repeatedly to every element on the list. \[\mathsf{cons}\ 1\ (\mathsf{cons}\ 2\ (\mathsf{cons}\ 3\ \mathsf{nil})) \overset{\mathsf{fold}\ \mathsf{plus}\ 0}\longrightarrow \mathsf{plus}\ 1\ (\mathsf{plus}\ 2\ (\mathsf{plus}\ 3\ 0)) = 6\]

The $\mathsf{fold}$ operation can be defined as follows. Many operations on lists are particular instances of it.

fold = \c.\n.\l.(l c n)
sum  = fold plus 0
prod = fold mult 1
all  = fold and true
any  = fold or false
length = fold (\h.\t.succ t) 0

sum (cons 1 (cons 2 (cons 3 nil)))                              --- [1]: 6   
all (cons true (cons true (cons true nil)))                     --- [2]: true

Map and filter

The two most commonly used particular cases of fold and frequent examples of the functional programming paradigm are $\mathsf{map}$ and $\mathsf{filter}$.

  • The map function applies a function $f$ to every element on a list.
  • The filter function removes the elements of the list that do not satisfy a given predicate. It filters the list, leaving only elements that satisfy the predicate.

They can be defined as follows.

map = \f.(fold (\h.\t.cons (f h) t) nil)
filter = \p.(fold (\h.\t.((p h) (cons h t) t)) nil)

The $\mathsf{map}$ function, given a list of the form $\mathsf{cons}\ h\ t$, returns a new list $\mathsf{cons}\ (f\ h)\ t$; and given an empty list $\mathsf{nil}$, returns $\mathsf{nil}$. On the $\mathsf{filter}$ fuction, we use a boolean to decide at each step whether to return a list with a head or return the tail ignoring the head.

mylist = cons 1 (cons 2 (cons 3 nil))
sum (map succ mylist)                                           --- [1]: 9
length (filter (leq 2) mylist)                                  --- [2]: 2

Binary trees

Lists have been defined using two constructors and binary trees will be defined using the same technique. The only difference with lists is that the cons constructor is replaced by a node constructor, which takes two binary trees as arguments. That is, a binary tree is

  • an empty tree; or
  • a node, containing a label, a left subtree, and a right subtree.

Defining functions using a fold-like combinator is again very simple due to the chosen representation. We need a variant of the usual function acting on three arguments, the label, the right node and the left node.

-- Binary tree definition
node = \x.\l.\r.\f.\n.(f x (l f n) (r f n))
-- Example on natural numbers
mytree    = node 4 (node 2 nil nil) (node 3 nil nil)
triplesum = \a.\b.\c.plus (plus a b) c
mytree triplesum 0                                              --- [1]: 9

Fixed points

<<sec-fixed-points>> A fixpoint combinator is a term representing a higher-order function that, given any function $f$, solves the equation $x = f\ x$ for $x$, meaning that, if $(\mathsf{fix}\ f)$ is the fixpoint of $f$, the following sequence of equations holds \[ \mathsf{fix}\ f = f (\mathsf{fix}\ f) = f ( f (\mathsf{fix}\ f)) = f ( f ( f (\mathsf{fix}\ f))) = … \]

Such a combinator exists; and it can be defined and used as follows.

fix := (\f.(\x.f (x x)) (\x.f (x x)))
fix (const id)                                                 --- [1]: id

Where := defines a function without trying to evaluate it to a normal form; this is useful in cases like the previous one, where the function has no normal form. Examples of its applications are a factorial function or a Fibonacci function, as in

fact := fix (\f.\n.iszero n 1 (mult n (f (pred n))))
fib  := fix (\f.\n.iszero n 1 (plus (f (pred n)) (f (pred (pred n)))))
fact 3                                                         --- [1]: 6
fib 3                                                          --- [2]: 5

Note the use of $\mathsf{iszero}$ to stop the recursion.

The $\mathsf{fix}$ function cannot be evaluated without arguments into a closed form, so we have to delay the evaluation of the expression when we bind it using !=. Our evaluation strategy, however, will always find a way to reduce the term if it is possible, as we saw in Corollary cor-leftmosttheorem; even if it has intermediate irreducible terms.

fix              -- diverges
true  id fix     -- evaluates to id
false id fix     -- diverges

Other examples of the interpreter dealing with non terminating functions include infinite lists as in the following examples, where we take the first term of an infinite list without having to evaluate it completely or compare an infinite number arising as the fix point of the successor function with a finite number.

-- Head of an infinite list of zeroes
head = fold const false
head (fix (cons 0))
-- Compare infinity with other numbers
infinity := fix succ                                         --- [1]: 0    
leq infinity 6                                               --- [2]: false

These definitions unfold as

  • $\mathsf{fix\ (cons\ 0) = cons\ 0\ (cons\ 0\ (cons\ 0\ …))}$, an infinite list of zeroes;
  • $\mathsf{fix\ succ = succ\ (succ\ (succ\ …))}$, an infinite natural number.

As a final example, we define a minimization operator that finds the first natural causing a given predicate to return true. This, with all the previous considerations about how to program with natural numbers, can be used to prove that Gödel’s μ-recursive functions can be encoded inside untyped lambda calculus, and thus, it is Turing-complete.

mu := \p.fix (\f.\n.(p n) n (f (succ n))) 0
mu (\n.eq (mult 2 n) 10)                                     --- [1]: 5

Programming in the simply typed λ-calculus

<<sec-programming-typed>> This section explains how to use the simply typed λ-calculus to encode compound data structures and proofs in intuitionistic logic. We will use the interpreter as a typed language and, at the same time, as a proof assistant for the intuitionistic propositional logic.

This presentation of simply typed structures follows the Mikrokosmos tutorial and the previous sections on simply typed λ-calculus (see Section sec-simplytypedlambda). All the code on this section is valid Mikrokosmos code.

Function types and typeable terms

Types can be activated with the command :types on. If types are activated, the interpreter will infer (see Section sec-typeinference) the principal type of every term before its evaluation. The type will then be displayed after the result of the computation.

If a term is found to be non-typeable, Mikrokosmos will output an error message signaling the fact. In this way, the evaluation of λ-terms that could potentially not terminate is prevented. Only typed λ-terms will be evaluated while the option :types is on; this ensures the termination of every computation on typed terms.

Note that the evaluation of compound λ-expressions where the fixpoint operators appear applied to other terms can terminate, but the terms are still non typeable.

Product, union, unit and void types

Until now, we have only used the function type. That is to say that we are working on the implicational fragment of the simply-typed lambda calculus we described when first describing typing rules. We are now going to extend our type system in the same sense we extended (see Section sec-extendingstlc) that simply-typed lambda calculus. The following types are added to the system.

TypeNameDescription
Function typeFunctions from a type to another
×Product typeCartesian product of types
+Union typeDisjoint union of types
Unit typeA type with exactly one element
Void typeA type with no elements

And the following typed constructors are added to the language.

ConstructorTypeDescription
(-,-)A → B → A × BPair of elements
fst(A × B) → AFirst projection
snd(A × B) → BSecond projection
inlA → A + BFirst inclusion
inrB → A + BSecond inclusion
caseof(A + B) → (A → C) → (B → C) → CCase analysis of an union
unitUnital element
abort⊥ → AEmpty function
absurd⊥ → ⊥Particular empty function

They correspond to the constructors we described on previous sections. The only new term is the absurd function, which is only a particular case of abort, useful when we want to make explicit that we are deriving an instance of the empty type. This addition will only make the logical interpretation on the following sections clearer.

Now it is possible to define a new encoding of the booleans with an uniform type. The type ⊤ + ⊤ has two inhabitants, inl ⊤ and inr ⊤; and they can be used by case analysis.

btrue = inl unit
bfalse = inr unit
bnot = \a.caseof a (\a.bfalse) (\a.btrue)
bnot btrue                            --- [1]: (INR UNIT) ⇒ bfalse :: A + ⊤
bnot bfalse                           --- [2]: (INL UNIT) ⇒ btrue :: ⊤ + A

With these extended types, Mikrokosmos can be used as a proof checker on first-order intuitionistic logic by virtue of the Curry-Howard correspondence.

A proof in intuitionistic logic

<<section-proof-lem>> Under the logical interpretation of Mikrokosmos, we can transcribe proofs in intuitionistic logic to λ-terms and check them on the interpreter. The translation between logical propositions and types is straightforward, except for the negation of a proposition $¬ A$, that must be written as $(A → \bot)$, a function to the empty type.

Note that this is, in fact, a constructive proof. Although it seems to use the intuitionistically forbidden technique of proving by contradiction, it is actually only proving a negation. There is a difference between assuming $A$ to prove $¬ A$ and assuming $¬ A$ to prove $A$: the first one is simply a proof of a negation, the second one uses implicitly the law of excluded middle.

This can be translated to the Mikrokosmos implementation of simply typed λ-calculus as the following term, whose type is precisely $((A + (A → \bot)) → \bot) → \bot$.

notnotlem = \f.absurd (f (inr (\a.f (inl a))))
notnotlem 
--- [1]: λa.(ABSURD (a (INR λb.(a (INL b))))) :: ((A + (A → ⊥)) → ⊥) → ⊥

The derivation tree can be seen directly on the interpreter as Figure mikrogentzen shows.

./images/mikrogentzen2.png

Category theory

Categories

<<section-categories>>

Motivation

Categories are algebraic structures that capture the notion of composition. They consist of objects linked by composable arrows; to which associativity and identity laws will apply. Thus, a category has to rely in some notion of collection of objects. When interpreted inside set-theory, this term can denote sets or proper classes. We want to talk about categories containing subsets of the class of all sets, and thus it is necessary to allow the objects to form a proper class (which is not itself a set) in order to avoid inconsistent results such as the Russell’s paradox. This is why we will consider a particular class of categories of small set-theoretical size to be specially well-behaved.

A different approach, however, would be to simply take objects and arrows as fundamental concepts of our theory. These foundational concerns will not cause any explicit problem in this presentation of category theory, so we will keep it deliberately open to both interpretations.

Definition of category

Categories, objects and morphisms

Definition of hom-sets

The collection of morphisms between two objects $A$ and $B$ is called an hom-set and it is written as $hom(A,B)$. When necessary, we can use a subscript, as in $hom{\cal C}(A,B)$, to explicitly specify the category we are working in.

Morphisms

Morphisms: introduction

Objects in category theory are an atomic concept and can be only studied by their morphisms; that is, by how they relate to each other. In other words, the essence of a category is given not by its objects, but by the morphisms between them and how composition is defined. It is so much so, that we will consider two objects essentially equivalent (and we will call them isomorphic) whenever they relate to other objects in the exact same way. This occurs if we can find an invertible morphism connecting both: composition by this morphism or its inverse will translate arrows from one object to the other.

Isomorphisms

Unicity of inverses

Equivalence relation

As expected, to be isomorphic to is an equivalence relation. In particular, reflexivity follows from the fact that the identity is its own inverse, $\id = \id-1$; symmetry follows from the inverse of an isomorphism being itself an isomorphism, $(f-1)-1 = f$; and transitivity follows by the fact that the composition of isomorphisms is itself an isomorphism, $(f ˆ g)-1 = g-1 ˆ f-1$. All these equalities can be checked from the axioms of a category.

Monomorphisms and epimorphisms

We can notice that morphisms are an abstraction of the notion of structure-preserving maps between mathematical structures. Two structures can be identified if they are related by an isomorphism. From this perspective, it seems natural to ask how injective and surjective can be described only in terms of composition. Monomorphisms and epimorphisms will be abstractions of the usual injective and surjective homomorphisms, respectively.

Retractions and sections

Note that we could have chosen a stronger and non-equivalent notion to generalize the notions of injective and surjective functions.

By virtue of Proposition prop-unicityinverse, a morphism that is both a retraction and a section is an isomorphism. In general, however, not every epimorphism is a section and not every monomorphism is a retraction. Consider for instance a category with two objects and a single morphism connecting them; it is a monomorphism and an epimorphism, but it has no inverse. In any case, we will usually work with the more general notion of monomorphisms and epimorphisms.

Products and sums

Motivation

Products and sums are very widespread notions in mathematics. Whenever a new structure is defined, it is common to ask what the product or sum of two of these structures would be. Examples of products are the cartesian product of sets, the product topology or the product of abelian groups; examples of sums are the disjoint union of sets, topological sum or the free product of groups. Following this idea, we can consider certain structures to constitute a $0$ or a $1$ for these operations; these are called initial and final objects.

We will abstract categorically these notions in terms of universal properties. This point of view, however, is an important shift with respect to how these properties are classically defined. We will not define the product of two objects in terms of their internal structure (categorically, objects are atomic and do not have any); but in terms of all the other objects, that is, in terms of the complete structure of the category. This turns inside-out the focus of the definitions. Moreover, objects defined in terms of universal properties are usually not uniquely determined, but only determined up to isomorphism. This reinforces our previous idea of considering two isomorphic objects in a category as essentially the same object.

Terminal and initial objects

Unicity

Note that these objects may not exist in any given category; but when they do, they are essentially unique. If $A,B$ are initial objects, by definition, there are a unique morphism $f : A → B$, a unique morphism $g : B → A$, and two unique morphisms $A → A$ and $B → B$. By uniqueness, $f ˆ g = \id$ and $g ˆ f = \id$, hence $A ≅ B$. An analogous proof can be written for terminal objects.

Products and sums

Note that neither the product nor the sum of two objects necessarily exist on a category; but when they do, they are essentially unique. This justifies writing them as $A × B$ and $A+B$. The proof is similar to that of the unicity of initial and terminal objects.

Examples of categories

Many mathematical structures, such as sets, groups, or partial orders, are particular cases of a category. Apart from these, we will be also interested in categories whose objects are known mathematical structures (the category of all groups, the category of all sets, and so on). The following are examples on how general the definition of a category is.

Discrete categories

Monoids, groups, groupoids

Posets

Category of Sets

Category of Groups

Category of Topological spaces

Functors and natural transformations

<<sec-functorsnatural>>

“Category” has been defined in order to define “functor” and “functor” has been defined in order to define “natural transformation”.

Saunders MacLane, Categories for the working mathematician, cite:maclane42.

Functors and natural transformations were defined for the first time by Eilenberg and MacLane in cite:maclane42 and cite:eilenberg45 while studying cohomology theory. While initially they were devised mainly as a language for this study, they have proven its foundational value with the passage of time. The notion of naturality will be a key element of our presentation of algebraic theories and categorical logic.

Functors

Definition of functor

Functors can be seen as a homomorphisms of categories that preserve composition and identity arrows. A functor between two categories, $F : {\cal C} → {\cal D}$, is given by

  • an object function, $F : \mathrm{obj}({\cal C}) → \mathrm{obj}({\cal D})$;
  • and an arrow function, $F : hom(A, B) → hom(FA, FB)$, for any two objects $A,B$ of the category;

such that $F(\id) = \id$, and $F(f ˆ g) = Ff ˆ Fg$. We can arrange functors, at least in the case of small categories, into a category of categories.


Composition of functors

Full and faithfull functors

We now consider certain properties of functors in this category. We say that a functor $F$ is full if every $g\colon FA → FB$ is of the form $Ff$ for some morphism $f \colon A → B$. We say $F$ is faithful if, for every two arrows $f_1,f_2\colon A → B$, $Ff_1 = Ff_2$ implies $f_1 = f_2$. It is easy to notice that the composition of faithful (respectively, full) functors is again a faithful functor (respectively, full).

These notions are equivalent to notions of injectivity and surjectivity on the arrow function between any two objects. Note, however, that a faithful functor needs not to be injective on objects nor on morphisms. In particular, if $A,A’,B,B’$ are four different objects, it could be the case that $FA = FA’$ and $FB = FB’$; and, if $f : A → B$ and $f’ : A’ → B’$ were two morphisms, it could be the case that $Ff = Ff’$. The following notion of isomorphism does require the complete functor to have an inverse.

Isomorphisms of categories

Unfortunately, this notion of isomorphism of categories is too strict in some cases. Sometimes, the two compositions $F ˆ G$ and $G ˆ F$ are not exactly the identity functor, but isomorphic to it in some sense yet to be made precise. We develop weaker notions in the next section.

Natural transformations

Natural transformations

We have defined functors relating structures on different categories, and now, we could consider if any two given functors can be related in a canonical way, involving no arbitrary choices. An example is the relation between the identity functor and the double dual functor in vector spaces. They are related by the canonical isomorphism that sends each vector to the operator that evaluates a function over it. In this case, the isomorphism can be described without making any explicit reference to the space. On the other hand, the isomorphism relating a vector space to its dual depends on the choice of a basis. A family of canonical linear isomorphisms $σ_V \colon V → V$ translating between an space and its dual, should be invariant to linear maps $μ \colon V → W$, so it should satisfy $σ_V(v)(w) = σ_W(μ(v))(μ(w))$, but this is not possible for all $μ$ linear. The notion of natural transformation formalizes this intuitive idea.

A natural transformation between two functors $F$ and $G$ with the same domain and codomain, written as $α\colon F \tonat G$, is a family of morphisms parameterized by the objects of the domain category, $\{α_C\colon FC → GC\}C ∈ {\cal C}$, such that $αC’ ˆ Ff = Gf ˆ α_C$ for every arrow $f \colon C → C’$. That is, the following diagram commutes. \[\begin{tikzcd} C \dar{f} & & FC \rar{α_C}\dar[swap]{Ff} & GC \dar{Gf}
C’ & & FC’ \rar{αC’} & GC’ \end{tikzcd}\]

Sometimes, we also say that the family of morphisms $α$ is natural in its argument. This naturality property is what allows us to “translate” a commutative diagram from a functor to another, as in the following example. \[\begin{tikzcd} A\arrow{dd}{h}\drar{f} & & & F A\arrow{dd}{F h}\drar{F f} \arrow{rrr}{α_A} & & & G A\arrow{dd}{}\drar{G f} &
& B \dlar{g} & & & F B \dlar{F g} \arrow{rrr}{α_B} & & & G B \dlar{G g} \ C & & & F C \arrow{rrr}{α_C} & & & G C & \ \end{tikzcd}\]

Natural isomorphisms

Natural isomorphisms are natural transformations in which every component, every morphism of the parameterized family, is invertible. In this case, the inverses form themselves a new transformation, whose naturality follows from the naturality of the original transformation. We say that $F$ and $G$ are naturally isomorphic, $F ≅ G$, when there is a natural isomorphism between them.


Equivalence of categories

The notion of a natural isomorphism between functors allows us to weaken the condition of strict equality we imposed when talking about isomorphisms of categories (Definition def-isomorphismcategories). We say that two categories ${\cal C}$ and ${\cal D}$ are equivalent if there exist two functors $F \colon {\cal C} → {\cal D}$ and $G \colon {\cal D} → {\cal C}$ such that $G ˆ F ≅ \Id{\cal C}$ and $F ˆ G ≅ \Id{\cal D}$. The pair of functors endowed with the two natural isomorphisms is called an equivalence of categories

Composition of natural transformations

Vertical and horizontal

The next reasonable step is to ask how natural transformations compose. If we draw natural transformations between functors as double arrows, as in the following diagram, \[\begin{tikzcd} {\cal C} \arrow[bend left=50]{r}[name=U,below]{}{F} \arrow[bend right=50]{r}[name=D]{}[swap]{G} & {\cal D} \arrow[Rightarrow,from=U,to=D]{}{α} \end{tikzcd}\] we notice that two different notions of composition arise; we have a vertical composition of natural transformations, which, diagrammatically, composes the two sequential natural transformations on the left side into a single transformation on the right side of the following diagram; \[\begin{tikzcd}[column sep=large] {\cal C} \arrow[bend left=80]{r}[name=U,below]{}{F} \arrow[bend right=80]{r}[name=D]{}[swap]{H} \arrow[]{r}[name=C]{}[swap,xshift=-1.5ex]{G} \arrow[]{r}[name=C2,below]{}[]{} & {\cal D} \arrow[Rightarrow,from=U,to=C]{}{α} \arrow[Rightarrow,from=C2,to=D]{}{β} & {\cal C} \arrow[bend left=50]{r}[name=U,below]{}{F} \arrow[bend right=50]{r}[name=D]{}[swap]{H} & {\cal D} \arrow[Rightarrow,from=U,to=D]{}{α\cdotβ} \end{tikzcd}\] and we have a horizontal composition of natural transformations, which composes the two parallel natural transformations on the left side into a single transformation on the right side of the following diagram. \[\begin{tikzcd}[column sep=large] {\cal C} \arrow[bend left=50]{r}[name=U,below]{}{F} \arrow[bend right=50]{r}[name=D]{}[swap]{G} & {\cal D} \arrow[Rightarrow,from=U,to=D]{}{α} \arrow[bend left=50]{r}[name=UU,below]{}{F’} \arrow[bend right=50]{r}[name=DD]{}[swap]{G’} & {\cal E} \arrow[Rightarrow,from=UU,to=DD]{}{α’} & {\cal C} \arrow[bend left=50]{r}[name=U,below]{}{F’ ˆ F} \arrow[bend right=50]{r}[name=D]{}[swap]{G’ ˆ G} & {\cal E} \arrow[Rightarrow,from=U,to=D]{}{α’ˆ\alpha} \end{tikzcd}\]

Vertical composition

Naturality of this family follows from the naturality of its two factors. Given any morphism $f \colon A → B$, the commutativity of the external square on the diagram below follows from the commutativity of the two internal squares. \[\begin{tikzcd} FA \rar{Ff}\dar{α_A}\arrow[swap,bend right=90]{dd}{(β ⋅ α)_A} & FB \dar{αB} \arrow[bend left=90]{dd}{(β ⋅ α)B}
GA \rar{Gf}\dar{β_A} & GB \dar{βB} \ HA \rar{Hf} & HB \end{tikzcd}\]

Horizontal composition

Naturality of this family follows again from the naturality of its two factors. Given any morphism $f \colon A → B$, the following diagram is commutative because it is the composition of two naturality squares given by the naturality of $F’α$ and $α’$. \[\begin{tikzcd} F’FA \rar{F’α} \dar{F’Ff} & F’GA \rar{α’} \dar{F’Gf} & G’GA \dar{G’Gf}
F’FB \rar{F’α} & F’GB \rar{α’} & G’GB \end{tikzcd}\]

Interchange law

Constructions on categories

<<section-constructions>> Before continuing our study of functors and universal properties, we provide some constructions and examples of categories we will need in the future. In particular, we consider some variations on the definition of functors and we use them to construct new categories.

Product categories

Product category

The product category (see cite:eilenberg45) of two categories ${\cal C}$ and ${\cal D}$, denoted by ${\cal C} × {\cal D}$, is their product object in the category $\mathsf{Cat}$. Explicitly, it is given by

  • objects of the form $\left\langle C,D \right\rangle$, where $C ∈ {\cal C}$ and $D ∈ {\cal D}$;
  • and morphisms of the form $\pair{f,g} : \pair{C,D} → \pair{C’,D’}$, where $f \colon C → C’$ and $g \colon D → D’$ are morphisms in their respective categories.

The identity morphism of any object $\pair{C,D}$ is $\pair{\id_C, \id_D}$, and composition is defined componentwise as $\pair{f’,g’} ˆ \pair{f,g} = \pair{f’ ˆ f,g’ ˆ g}$. The definition of the product also provides projection functors $P\colon {\cal C} × {\cal D} → {\cal C}$ and $Q : {\cal C} × {\cal D} → {\cal D}$ on arrows as $P\pair{f,g} = f$ and $Q\pair{f,g} = g$.

Product of functors

The product functor of two functors $F\colon {\cal C} → {\cal C}’$ and $G \colon {\cal D} → {\cal D}’$ is the unique functor $F × G \colon {\cal C} × {\cal D} → {\cal C}’ × {\cal D}’$ making the following diagram commute. \[\begin{tikzcd} {\cal C} \dar{F} & {\cal C} × {\cal D} \rar{Q}\lar[swap]{P} \dar[dashed]{F × G}& {\cal D} \dar{G}
{\cal C}’ & {\cal C}’ × {\cal D}’ \rar[swap]{Q’}\lar{P’}& {\cal D}’ \end{tikzcd}\] Explicitly, it acts on arrows as $(F × G)\pair{f,g} = \pair{Ff,Gg}$. In this sense, the $×$ operation could be seen as a functor with two arguments acting on objects and morphisms of the $\Cats$ category. We now formalize this idea of a functor in two variables. \

Bifunctors

Bifunctors are functors from a product category, but they can also be regarded as functors on two variables. As we will show in the following proposition, they are completely determined by the two families of functors we obtain when we fix any of the arguments. We will also study a characterization of naturality between these functors.

Naturality for bifunctors

Opposite categories and contravariant functors

For many constructions in categories, it makes sense to consider how the process of reverting all the arrows yields a new construction.

Opposite category

The opposite category ${\cal C}op$ of a category ${\cal C}$ is a category with the same objects as ${\cal C}$ but with all its arrows reversed. That is, for each morphism $f : A → B$, there exists a morphism $fop : B → A$ in ${\cal C}op$. Composition is defined as $fop ˆ gop = (gˆ f)op$, exactly when the composite $g ˆ f$ is defined in ${\cal C}$.

Reversing all the arrows is a process that directly translates every property of the category into its dual property. A morphism $f$ is a monomorphism if and only if $fop$ is an epimorphism; a terminal object in ${\cal C}$ is an initial object in ${\cal C}op$ and a right inverse becomes a left inverse on the opposite category. This process is also an involution, where $(fop)op$ can be seen as $f$, and $({\cal C}op)op$ is trivially isomorphic to ${\cal C}$.

Contravariant functors

Functor categories

Given two categories ${\cal B},{\cal C}$, the functor category ${\cal B}{\cal C}$ has all functors from ${\cal C}$ to ${\cal B}$ as objects and natural transformations between them as morphisms. If we consider the category of small categories $\Cats$, there is a hom-profunctor $-- \colon \Catsop × \Cats → \Cats$ sending any two categories ${\cal B}$ and ${\cal C}$ to their functor category ${\cal B}{\cal C}$. Many mathematical constructions are examples of functor categories.

Category of graphs

In cite:lawvere09, multiple examples of usual mathematical constructions in terms of functor categories can be found. Graphs, for instance, can be seen as functors; and graphs homomorphisms as the natural transformations between them.

Dynamical systems and continuous dynamical systems

Universality and limits

<<section-universality>>

Universal arrows

A universal property is commonly given in mathematics by some conditions of existence and uniqueness on morphisms, representing some sort of natural isomorphism. They can be used to define certain constructions up to isomorphism and to operate with them in an abstract setting. We will formally introduce universal properties using universal arrows from an object $C$ to a functor $G$; the property of these arrows is that every arrow of the form $C → GD$ will factor uniquely through the universal arrow.


Universal arrow

A universal arrow from $C ∈ {\cal C}$ to $G \colon {\cal D} → {\cal C}$ is a morphism $u \colon C → GD_0$ such that for every $g \colon C → GD$ exists a unique morphism $f \colon D_0 → D$ making the following diagram commute. \[\begin{tikzcd} & GD & D
C \rar[swap]{u}\urar{g} & GD_0 \uar[swap,dashed]{Gf} & D_0 \uar[dashed,swap]{∃! f} \end{tikzcd}\] Note how an universal arrow is, equivalently, the initial object of the comma category $(C ↓ G)$. Thus, universal arrows must be unique up to isomorphism.

Universality in terms of hom-sets

Couniversal arrows

The categorical dual of an universal arrow from an object to a functor is the notion of universal arrow from a functor to an object. Note how, in this case, we avoid the name couniversal arrow; as both arrows are representing what we usually call a universal property.

A dual universal arrow from $G$ to $C$ is a morphism $v \colon GD_0 → C$ such that for every $g \colon GD → C$ exists a unique morphism $f \colon D → D_0$ making the following diagram commute. \[\begin{tikzcd} D\dar[dashed,swap]{∃! f} & GD \dar[swap,dashed]{Sf}\drar{g} &
D_0 & GD_0 \rar[swap]{v} & C \end{tikzcd}\]

Representability

A representation of a functor from a locally small category to the category of sets, $K \colon {\cal D} → \Sets$, is a natural isomorphism $ψ\colon hom{\cal D}(r,-) ≅ K$ making it isomorphic to a partially applied hom-functor. A functor is representable if it has a representation and the object $r$ used in the representation is called its representing object.

Yoneda Lemma

Motivation

The Yoneda lemma is one of the first results in pure category theory. It allows us to embed any small category into a functor category over the category of sets. It will be very useful when studying presheaves and algebraic theories (see Section sec-lawveretheories) to create models of these theories.

Statement of the Yoneda Lemma

Natural transformations between representable functors

Naturality of the Yoneda Lemma

Yoneda functor

Limits

Motivation: products

In the definition of product, we chose two objects of the category, we considered all possible cones over two objects and we picked the universal one. \[\begin{tikzcd}[column sep=tiny] &&&&& D \dar[dashed]{∃!} \ar[bend left,color=myred]{ddr}\ar[bend right,swap,color=myred]{ddl} &\& C \drar[color=myred]\dlar[color=myred] &&&& A × B \drar[color=myred]\dlar[color=myred] &
A && B &\phantom{gap}& A && B \end{tikzcd}\] In this previous commutative diagram, $C$ is a cone and $A × B$ is the universal one: every cone factorizes through it. In this particular case, the base of each cone is given by two objects; or, in other words, by the image of a functor from the discrete category with only two objects, called the index category.

We will be able to create new constructions on categories by formalizing the notion of cone and generalizing to arbitrary bases, given as functors from arbitrary index categories. Constant functors are the first step into formalizing the notion of cone.

Constant functors

We could say that $Δ c$ compresses the whole category ${\cal J}$ into $c$. A natural transformation from this functor to some other $F \colon {\cal J} → {\cal C}$ should be regarded as a /\red{cone}/ from the object $c$ to a copy of ${\cal J}$ inside the category ${\cal C}$; as the following diagram exemplifies. \[\begin{tikzcd}[column sep=tiny, row sep=tiny] &&& &&& & c \ar[color=myred]{dd}\ar[bend left, color=myred]{ddr}\ar[bend right, color=myred]{dddl}\ar[bend left, color=myred]{dddrr} &&
&\phantom{gap}&& &&& & \phantom{gap} &&\ & ⋅ \ar{ld}\ar{r}\ar{drr} & ⋅ \ar{dr} & \ar[Rightarrow, shorten <= 3mm, shorten >= 14mm]{uurrrr}[xshift=-3mm]{Δ} &&& & ⋅ \ar{ld}\ar{r}\ar{drr} & ⋅ \ar{dr} & \ ⋅ \arrow{rrr}[description, yshift=-5mm]{\big{\cal J}} &&& ⋅ \ar[Rightarrow, yshift=3mm,swap, shorten <= 2mm, shorten >= 2mm]{rrr}{F} &\phantom{gap}&& ⋅ \arrow{rrr}[description, yshift=-5mm]{\big{F \cal J}} & & & ⋅ \ %&{\cal J}&& &&& &F{\cal J}&& \end{tikzcd}\] The components of the natural transformation appear highlighted in the diagram. The naturality of the transformation implies that each triangle \[\begin{tikzcd}[column sep=tiny,row sep=small] &⋅ \drar[color=myred, shorten >= -1.5mm, shorten <= -2mm] \dlar[color=myred, shorten >= -1.5mm, shorten <= -2mm] &\ ⋅ \ar[shorten >= -1mm, shorten <= -1mm]{rr} && ⋅ \end{tikzcd}\] on that cone must be commutative. Thus, natural transformations are a way to recover all the information of an arbitrary *index category* ${\cal J}$ that was encoded in $c$ by the constant functor. As we did with products, we want to find the cone that best encodes that information; a universal cone, such that every other cone factorizes through it. Diagrammatically an $r$ such that, for each $d$, \[\begin{tikzcd}[column sep=tiny, row sep=tiny] &d \ar[color=myred,bend right]{dddd} \ar[bend left, color=myred]{ddddr} \ar[bend right, color=myred]{dddddl} \ar[bend left, color=myred]{dddddrr} \ar[dashed]{dd}{∃!} &&\ &&\phantom{d}&\ & r \ar[color=myred]{dd}\ar[bend left, color=myred]{ddr}\ar[bend right, color=myred]{dddl}\ar[bend left, color=myred]{dddrr} &&\ & \phantom{gap} &&\ & ⋅ \ar{ld}\ar{r}\ar{drr} & ⋅ \ar{dr} & \ ⋅ \ar{rrr} & & & ⋅ \ \end{tikzcd}\] That factorization will be represented in the formal definition of limit by a universal natural transformation between the two constant functors.

Definition of limit

That is, for every natural transformation $w \colon Δ d \tonat F$, there is a unique morphism $f \colon d → r$ such that \[\begin{tikzcd} d\dar[dashed,swap]{∃! f} & Δ d \dar[swap,dashed]{Δ f}\drar{w} &
r & Δ r \rar[swap]{v} & F \end{tikzcd}\] commutes. This reflects directly on the universality of the cone we described earlier and proves that limits are unique up to isomorphism.

By choosing different index categories, we will be able to define multiple different constructions on categories as limits.

Examples of limits

Equalizers

For our first example, we will take the following category, called $↓\downarrow$ as index category, \[\begin{tikzcd} ⋅ \rar[bend left] \rar[bend right] & ⋅ \end{tikzcd}\] A functor $F \colon ↓\downarrow → {\cal C}$ is a pair of parallel arrows in ${\cal C}$. Limits of functors from this category are called equalizers. With this definition, the equalizer of two parallel arrows $f,g \colon A → B$ is an object $\mathrm{eq}(f,g)$ with a morphism $e \colon \mathrm{eq}(f,g) → A$ such that $f ˆ e = g ˆ e$; and such that any other object with a similar morphism factorizes uniquely through it. \[\begin{tikzcd}[column sep=tiny] & D \dar[dashed]{∃!} \arrow[bend right, swap, color=myred]{ddl}{e’} \arrow[bend left, color=myred]{ddr} &\& \operatorname{eq}(f,g) \drar[color=myred] \dlar[swap, color=myred]{e} &
A \arrow[bend left=15]{rr}{f} \arrow[bend right=15, swap]{rr}{g} && B \end{tikzcd}\] Note how the right part of the cone is completely determined as $f ˆ e$. Because of this, equalizers can be written without specifying it, and the diagram can be simplified to the following one. \[\begin{tikzcd} \mathrm{eq}(f,g) \rar{e} & A \arrow[bend left=15]{r}{f} \arrow[bend right=15, swap]{r}{g} & B \ D \ar[swap]{ur}{e’} \ar[dashed]{u}{∃!} &&& \end{tikzcd}\]

Equalizers in Sets

Kernels

Pullbacks

Pullbacks are defined as limits whose index category is $⋅ → ⋅ ← ⋅$. Any functor from that category is a pair of arrows with a common codomain; and the pullback is the universal cone over them. \[\begin{tikzcd} & D \arrow[bend right,swap,color=myred]{ddl}{p’} \arrow[bend left,color=myred]{ddr}{q’} \arrow[bend right,color=myred]{dd} \arrow[dashed]{d}{∃!} &
& A \drar[bend left,swap,color=myred]{q} \dlar[bend right,color=myred]{p} \arrow[color=myred]{d} & \ X\rar[swap]{f} & Z & Y\lar{g} \ \end{tikzcd}\] Again, the central arrow of the diagram is determined as $f ˆ q = g ˆ p$; so it can be omitted in the diagram. The usual definition of a pullback for two morphisms $f \colon X → Z$ and $g \colon Y → Z$ is the universal pair of morphisms $p \colon A → X$ and $q \colon A → Y$ such that $f ˆ q = g ˆ p$, that is, given any pair of morphisms $p’ \colon D → X$ and $q’ \colon D → Y$, there exists a unique $u \colon D → A$ making the diagram commute. Usually we write the pullback object as $X ×_Z Y$ and we write this property diagrammatically as follows. \[\begin{tikzcd} D \ar[dashed]{dr}{∃! u} \ar[bend left]{drr}{p’} \ar[swap,bend right]{ddr}{q’} &&\ & X ×_z Y \dar[swap]{q} \rar{p} & X \dar{f} \ & Y \rar[swap]{g} & Z \end{tikzcd}\] The square in this diagram is usually called a pullback square, and the pullback object is sometimes called a fibered product. Pullbacks will be a central notion when developing semantics for dependent types in Section sec-locallydependent.

Pullbacks in Sets

Colimits

Cocones

A colimit is the dual notion of a limit. We could consider cocones to be the dual of cones and pick the universal one. Once an index category ${\cal J}$ and a base category ${\cal C}$ are fixed, a cocone is a natural transformation from a functor on the base category to a constant functor. Diagrammatically, \[\begin{tikzcd}[column sep=tiny, row sep=tiny] &&& &&& & c &&
&\phantom{gap}&& &&& & \phantom{gap} &&\ & ⋅ \ar{ld}\ar{r}\ar{drr} & ⋅ \ar{dr} & \ar[Rightarrow, shorten <= 3mm, shorten >= 14mm]{uurrrr}[xshift=-3mm]{Δ} &&& & ⋅ \ar{ld}\ar{r}\ar{drr}\ar[color=myblue]{uu} & ⋅ \ar{dr} \ar[color=myblue,bend right]{uul} & \ ⋅ \arrow{rrr}[description, yshift=-5mm]{\big{\cal J}} &&& ⋅ \ar[Rightarrow, yshift=3mm,swap, shorten <= 2mm, shorten >= 2mm]{rrr}{F} &\phantom{gap}&& ⋅ \arrow[color=myblue, bend left]{uuur} \arrow{rrr}[description, yshift=-5mm]{\big{F \cal J}} &&& ⋅ \arrow[color=myblue, bend right]{uuull}\ \end{tikzcd}\] is an example of a cocone. The universal one would be the $r$, such that, for each cone $d$, \[\begin{tikzcd}[column sep=tiny, row sep=tiny] &d &&\ &&\phantom{d}&\ & r \ar[dashed,swap]{uu}{∃!} &&\ & \phantom{gap} &&\ & ⋅ \ar{ld}\ar{r}\ar{drr} \ar[color=myblue,bend left]{uuuu} \ar[color=myblue]{uu}& ⋅ \ar{dr} \ar[bend right, color=myblue]{uuuul} \ar[color=myblue,bend right]{uul}& \ ⋅ \ar{rrr} \ar[bend left, color=myblue]{uuuuur} \ar[bend left, color=myblue]{uuur} &&& ⋅ \ar[bend right, color=myblue]{uuull} \ar[bend right, color=myblue]{uuuuull} \ \end{tikzcd}\] Naturality implies that each triangle of the following form commutes. \[\begin{tikzcd}[column sep=tiny,row sep=small] &⋅ &\ ⋅ \ar[color=myblue, shorten >= -1.5mm, shorten <= -2mm]{ur} \ar[shorten >= -1mm, shorten <= -1mm]{rr} && ⋅ \ar[color=myblue, shorten >= -1.5mm, shorten <= -2mm]{ul} \end{tikzcd}\]

Definition of colimits

That is, for every natural transformation $w \colon F \tonat Δ d$, there is a unique morphism $f \colon r → d$ such that \[\begin{tikzcd} d & Δ d &
r\uar[dashed]{∃! f} & Δ r \uar[dashed]{Δ f} & F \lar{v}\ular[swap]{w} \end{tikzcd}\] commutes. This reflects directly on the universality of the cocone we described earlier and proves that colimits are unique up to isomorphism.

Examples of colimits

Coequalizers

Coequalizers are the dual of equalizers; colimits of functors from $↓\downarrow$. The coequalizer of two parallel arrows is an object $\mathrm{coeq}(f,g)$ with a morphism $e \colon b → \mathrm{coeq}(f,g)$ such that $e ˆ f = e ˆ g$; and such that any other object with a similar morphism factorizes uniquely through it. \[\begin{tikzcd}[column sep=tiny] & d &\& \operatorname{coeq}(f,g) \uar[dashed]{∃!} &
a \ar[bend left, color=myblue, swap]{uur} \ar[color=myblue]{ur} \arrow[bend left=15]{rr}{f} \arrow[bend right=15, swap]{rr}{g} && b \ar[color=myblue, swap]{ul}{e} \ar[color=myblue, bend right, swap]{uul}{e’} \end{tikzcd}\] As the arrows on the right of the cocone are completely determined by the fact that the diagram must be commutative, we can simplify the diagram as follows. \[\begin{tikzcd} a \ar[bend left=15]{r}{f} \ar[bend right=15, swap]{r}{g} & b \ar{r}{e} \ar[swap]{dr}{e’}& \mathrm{coeq}(f,g) \ar[dashed]{d}{∃!} \ && d \end{tikzcd}\]

Pushouts

Pushouts are the dual of pullbacks; colimits whose index category is $⋅ ← ⋅ → ⋅$, that is, the dual of the index category for pullbacks. \[\begin{tikzcd} & d &
& a \arrow[dashed,swap]{u}{∃!} & \ x \urar[bend left,color=myblue]{p} \arrow[bend left,color=myblue]{uur}{p’} & z \lar{f} \rar[swap]{g} \arrow[color=myblue]{u} \arrow[bend left,color=myblue]{uu} & y \ular[bend right,swap,color=myblue]{q} \arrow[bend right,swap,color=myblue]{uul}{q’} \ \end{tikzcd}\] We can define the pushout of two morphisms $f \colon z → x$ and $g \colon z → y$ as a pair of morphisms $p \colon x → a$ and $q \colon y → a$ such that $p ˆ f = q ˆ g$ which are also universal, that is, given any pair of morphisms $p’ \colon x → d$ and $q’ \colon y → d$, there exists a unique $u \colon a → d$ making the following diagram commute. \[\begin{tikzcd} z \rar{g} \dar[swap]{f} & y \dar{q}\ar[bend left]{rdd}{q’} & \ x \rar{p}\ar[bend right,swap]{drr}{p’} & x \amalg_z y \ar[dashed,swap]{dr}{∃! u} & \ & & d \end{tikzcd}\] The square in this diagram is usually called a pushout square, and the pullback object is sometimes called a fibered coproduct.

Adjoints, monads and algebras

<<section-adjoints-monads>>

Adjunctions

Definition of adjunction

An adjunction from categories ${\cal X}$ to ${\cal Y}$ is a pair of functors $F\colon {\cal X} → {\cal Y}$, $G\colon {\cal Y} → {\cal X}$ with a bijection $\varphi \colon hom(FX,Y) ≅ hom(X,GY)$, natural in both $X∈ {\cal X}$ and $Y ∈ {\cal Y}$. We say that $F$ is left-adjoint to $G$ and that $G$ is right-adjoint to $F$ and we denote this as $F \dashv G$.

Naturality of $\varphi$, by Proposition prop-naturality-bifunctor, means that both \[\begin{tikzcd} hom(FX,Y) \rar{\varphi} \dar[swap]{- ˆ Fh} & hom(X,GY) \dar{- ˆ h} & hom(FX,Y) \rar{\varphi} \dar[swap]{k ˆ -} & hom(X,GY) \dar{Gk ˆ -}
hom(FX’,Y) \rar[swap]{\varphi} & hom(X’,GY) & hom(FX,Y’) \rar[swap]{\varphi} & hom(X,GY’) \end{tikzcd}\] commute for every $h \colon X → X’$ and $k \colon Y → Y’$. That is, for every $f \colon FX → Y$, we have $\varphi(f) ˆ h = \varphi(f ˆ Fh)$ and $Gk ˆ \varphi(f) = \varphi(k ˆ f)$. Equivalently, $\varphi-1$ is natural, and that means that, for every $g \colon X → GY$, we have $k ˆ \varphi-1(g) = \varphi-1(Gk ˆ g)$ and $\varphi-1(g) ˆ Fh = \varphi-1(g ˆ h)$.


Lawvere’s notation

A different and maybe more intuitive way to write adjunctions is by logical diagrams (see cite:lawvere309). An adjunction $F\dashv G$ can be written as

\begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] FX \rar{f}\& Y \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] X \rar{\varphi{(f)}}\& GY \end{tikzcd}} \end{prooftree} to emphasize that, for each morphism $f\colon FX → Y$, there exists a unique morphism $\varphi(f)\colon X → GY$; written in a way that resembles bidirectional logical inference rules. Naturality, in this setting, means that precomposition and postcomposition of arrows are preserved by the inference rule. Given morphisms $h \colon X’ → X$ and $k \colon Y → Y’$, we know by naturality that the composed arrows of the following diagrams are adjoint to one another. \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] FX’ \ar[bend left=45]{rr}{f ˆ Fh} \rar{Fh}\& FX \rar{f}\& Y \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] X’ \ar[bend right=45]{rr}[swap]{\varphi(f) ˆ h} \rar[swap]{h}\& X \rar[swap]{\varphi(f)}\& GY \end{tikzcd}} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] FX \ar[bend left=45]{rr}{k ˆ f} \rar{f}\& Y \rar{k}\& Y’ \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] X \ar[bend right=45]{rr}[swap]{Gk ˆ \varphi(f)} \rar[swap]{\varphi(f)}\& GY \rar[swap]{Gk}\& GY’\end{tikzcd}} \noLine \BIC{} \end{prooftree} In other words, $\varphi(f) ˆ h = \varphi(f ˆ Fh)$ and $Gk ˆ \varphi(f) = \varphi(k ˆ f)$, as we wrote earlier.

Unit and counit of an adjunction

In the following two propositions, we will characterize all this information in terms of natural transformations made up of universal arrows. Given an adjunction $F \dashv G$, we define

  • the unit $η$ as the family of morphisms $η_X = \varphi(\idFX) \colon X → GFX$, for each $X$;
  • the counit $ε$ as the family of morphisms $ε_Y = \varphi-1(\idGY) \colon FGY → Y$, for each $Y$.

Diagrammatically, they can be obtained by taking $Y$ to be $FX$ and $X$ to be $GY$, respectively, in the definition of adjunction.

\begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] FX \rar{\id}\& FX \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] X \rar{ηx}\& GFX \end{tikzcd}} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] FGY \rar{εy}\& Y \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] GY \rar{\id}\& GY \end{tikzcd}} \noLine \BIC{} \end{prooftree}

Units and counits are natural transformations

Characterization of adjunctions

Uniqueness

Composition of adjoints

Examples of adjoints

Product and coproduct

Free and forgetful functors

Monads

<<sec-monads>>

Motivation

The notion of monads is pervasive in category theory. A monad is a certain type of endofunctor that arises naturally when considering adjoints. They will be useful to model algebraic notions inside category theory and to model a variety of effects and contextual computations in functional programming languages. We will only prove here that each adjunction gives rise to a monad, but it is also true that there are multiple ways to turn a monad into an adjunction. With this result, we aim to illustrate how adjoints capture another fundamental concept in functional programming.


Definition of monads

A monad is a functor $T\colon X → X$ with natural transformations

  • $η\colon \Id \tonat T$, called unit; and
  • $μ \colon T^2 \tonat T$, called multiplication;

such that the following two diagrams commute. \[\begin{tikzcd} T^3 \rar{Tμ}\dar{μ T} & T^2\dar{μ}
T^2 \rar{μ} & T \end{tikzcd} \qquad \begin{tikzcd} \Id ˆ T \rar{η T}\drar[swap]{≅} & T^2\dar{μ} & \lar[swap]{Tη}\dlar{≅} Tˆ \Id \ & T & \end{tikzcd}\]

The first diagram is encoding some form of associativity of the multiplication, while the second one is encoding the fact that $η$ creates a neutral element with respect to this multiplication. These statements will be made precise when we talk about algebraic theories. A comonad is the dual of a monad.

Each adjunction gives rise to a monad

Algebras

<<sec-algebras>> In category theory, the word algebra denotes certain structures that can be easily described in terms of endofunctors as universal fixed points for some functors. They model inductive constructions in terms of category theory.

Let $F \colon {\cal C} → {\cal C}$ be a functor. An $F\textbf{-algebra}$ is an object $X$ with a map $μ \colon FX → X$ called structure map. A morphism between two $F\text{-algebras}$ $μ \colon FX → X$ and $ν \colon FY → Y$ is a map $h : X → Y$ such that the following diagram commutes. \[\begin{tikzcd} FX \rar{Fh}\dar{μ} & FY\dar{ν}
X \rar{h} & Y \end{tikzcd}\] This defines a category $\mathsf{Alg}_F({\cal C})$. The initial object of this category, is called the initial algebra for $F$; it needs not to exist, but when it does, it is unique up to isomorphism.

Lambek’s theorem

$F\text{-algebras}$ are closely related to induction principles. The following theorem states that the initial algebra is a fixed point of the functor, and, by its initiality property, it maps into any other fixed point.

Natural numbers as an initial algebra

Lists or free monoids

Categorical logic

Presheaves

<<section-presheaves>>

Presheaves can be thought as sets parameterized by a category or as generalized sets. As we will study in the following chapters, categories of presheaves share a lot of categorical structure with the category of Sets and can be used as non-standard models of lambda calculus.

Specifically, a presheaf on the category ${\cal C}$ is a functor $S \colon {\cal C}op → \Sets$ from the opposite category to the category of sets. The category of presheaves on ${\cal C}$ is the functor category of the form $\Set{\cal Cop}$. In particular, the examples we gave when talking about functor categories (Examples example-functor-graphs and example-dynamical) are presheaf categories. We see that they all share some structure with the category of sets; in the following sections, we will prove stronger results based on these.

All small limits exist in Sets

Pullbacks, in particular products always exists, thus all finite limits

Cartesian closed categories and lambda calculus

<<sec-cartesianclosedlambdacalculus>>

Lawvere theories

<<sec-lawveretheories>>

Signature to category

The usual notion of algebraic theory is given by a set of k-ary operations for each $k ∈ \mathbb{N}$ and certain axioms between the terms that can be constructed inductively using free variables and these operations. For instance, the theory of groups is given by a binary operation $(⋅)$, a unary operation $(-1)$, and a constant or 0-ary operation $e$; satisfying the following axioms \[ x ⋅ x-1 = e, \quad x-1 ⋅ x = e, \quad (x ⋅ y) ⋅ z = x ⋅ (y ⋅ z), \quad x ⋅ e = x, \quad e ⋅ x = x, \] for any free variables $x,y,z$. The problem with this notion of algebraic theory is that it is not independent from its representation: there may be multiple formulations for the same theory, with different but equivalent axioms. For example, cite:mccune91 discusses many single-equation axiomatizations of groups, such as \[ x\ /\ \big(((x/x)/y)/z)\ /\ ((x/x)/x)/z\big) = y \] with the binary operation $/$, related to the usual multiplication as $x / y = x ⋅ y-1$. Our solution to this problem will be to capture all the algebraic information of a theory – all operations, constants and axioms – into a category. Differently presented but equivalent axioms will give rise to the same category.

Definition

An algebraic theory can be built from its operations and axioms as follows: objects represent natural numbers, $A^0,A^1,A^2,…$, and morphisms from $A^n$ to $A^m$ are given by a tuple of $m$ terms $t_1,…,t_m$ depending on $n$ free variables $x_1,…,x_n$, written as \[ (x_1… x_n \vdash \pair{t_1,…,t_k}) \colon A^n → A^m. \] Composition is defined as componentwise substitution of the terms of the first morphism into the variables of the second one; that is, given $(x_1… x_k \vdash \pair{t_1,…,t_m}) \colon A^k → Am$ and $(x_1… x_m \vdash\pair{u_1,…,u_n}) \colon A^m → A^n$, their composition is $(x_1… x_k\vdash \pair{s_1,…,s_n})$, where $s_i = u_i[t_1,…,t_m / x_1,…,x_m]$. Two morphisms are considered equal, $(x_1… x_n \vdash \pair{t_1,…,t_k}) = (x_1… x_n \vdash \pair{t’_1,…,t’_k})$ when componentwise equality of terms, $t_i = t’_i$, follows from the axioms of the theory. Note that identity is the morphism $(x_1… x_n \vdash \pair{x_1,…, x_n})$. The kth-projection from $A^n$ is the term $(x_1… x_n \vdash x_k)$, and it is easy to check that these projections make $A^n$ the n-fold product of $A$. We have built a category with the desired properties.

Models

The category of models, $\Mod{\cal C}(\mathbb{A})$, is the subcategory of the functor category ${\cal C}\mathbb{A}$ containing the functors that preserve all finite products, with the usual natural transformations between them. We say that a category is algebraic if it is equivalent to a category of the form $\Mod{\cal C}(\mathbb{A})$.

Examples


Completeness of algebraic theories

By construction we know that, if an equation can be proved from the axioms, it is valid in all models (our semantics are sound); but we would like to also prove that, if every model of the theory satisfies a particular equation, it can actually be proved from the axioms of the theory (our semantics are complete). In general, we can actually prove a stronger, but maybe unsatisfying, result.

This proof feels rather disappointing because this trivial model is not even set-theoretic in general; but we can go further and assert the existence of a universal model in a presheaf category via the Yoneda embedding (Definition def-yoneda-functor).

Universal group

Cartesian closed categories

Definition

These adjoints are given by terminal, product and exponential objects respectively, written as follows. \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] ∗ \rar{}\& ∗ \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] C \rar{!}\& 1 \end{tikzcd}} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] C,C \rar{f,g}\& A,B \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] C \rar{\pair{f,g}}\& A × B \end{tikzcd}} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] C × A \rar{f}\& B \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] C \rar{\widetilde{f}}\& B^A \end{tikzcd}} \noLine \TIC{} \end{prooftree} In particular, exponentials are characterized by the family of evaluation morphisms $ε \colon B^A × A → B$, which form the counit of the adjunction. \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] C × A \ar[bend left=45]{rr}{f} \rar{\widetilde{f} × \id}\& B^A × A \rar{ε}\& B \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] C \ar[bend right=45]{rr}[swap]{\widetilde{f}} \rar[swap]{\widetilde{f}}\& B^A \rar[swap]{\id}\& B^A \end{tikzcd}} \end{prooftree}

Presheaves and sets are cartesian closed

In general, cartesian-closed categories with functors preserving products and exponentials form a category called $\mathsf{Ccc}$ that we show equivalent to a category containing lambda theories in Theorem th-equivalence-ccctl.

Simply-typed λ-theories

Cartesian category

If we read $Γ \vdash a : A$ as a morphism from the context $Γ$ to the output type $A$, the rules of simply-typed lambda calculus with product and unit types match the adjoints that determine a cartesian closed category. \begin{prooftree} \AXC{} \UIC{$Γ \vdash ∗ : 1$} \AXC{$Γ \vdash a : A$} \AXC{$Γ \vdash b : B$} \BIC{$Γ \vdash \pair{a,b} : A × B$} \AXC{$Γ, a : A \vdash b : B$} \UIC{$Γ \vdash (λ a.b) : A → B$} \noLine \TIC{} \end{prooftree} Note that categorical projections from the product object correspond to the pair projections we called $\fst$ and $\snd$, and the evaluation morphism is simply function application in lambda calculus. This motivates the idea that simply typed λ-calculus is a manifestation of cartesian closed categories in some sense yet to be made explicit.


Lambda theories

A λ-theory $\mathbb{T}$ is the analog of a Lawvere algebraic theory (Definition def-algebraictheory) for cartesian closed categories. It is given by a set of basic types and constants over the simply-typed lambda calculus and a set of equality axioms, determining a definitional equality $≡$, an equivalence relation preserving the structure of the simply-typed lambda calculus; that is \begin{center}\begin{array}{ll} t ≡ {∗}, & \mbox{ for each } t : 1;
\pair{a,b} ≡ \pair{a’,b’}, & \mbox{ for each } a ≡ a’,\ b ≡ b’ \ \fst\pair{a,b} ≡ a,\ \snd\pair{a,b}≡ b, & \mbox{ for each } a : A, b : B; \ \fst\ m ≡ \fst\ m’,\ \snd\ m ≡ \snd\ m’, & \mbox{ for each } m ≡ m’; \ m ≡ \pair{\fst\ m,\snd\ m}, & \mbox{ for each } m : A × B; \ f\ x ≡ f’\ x’, & \mbox{ for each } f ≡ f’, x ≡ x’; \ (λ x.f\ x) ≡ f, & \mbox{ for each } f : A → B; \ (λ x.m) ≡ (λ x.m’), & \mbox{ for each } m ≡ m’; \ (λ x.m)\ n ≡ m[n/x] & \mbox{ for each } m : B, n : A. \ \end{array}\end{center} Two types are isomorphic, $A ≅ A’$ if there exist terms $f \colon A → A’$ and $g \colon A’ → A$ such that $f\ (g\ a) ≡ a$ for each $a {:} A$, and $g\ (f\ a’) ≡ a’$ for each $a’{:}A’$.

System T

Untyped lambda calculus as a theory

Translations

We consider the category $λ\mathsf{Thr}$ of λ-theories with translations as morphisms. Note that the identity is a translation and that the composition of two translations is again a translation. Our goal is to prove that this category is equivalent to that of cartesian closed categories with functors preserving products and exponentials.

Apart from the natural definition of isomorphism, we consider the weaker notion of equivalence of theories. Two theories with translations $τ \colon \mathbb{T} → \mathbb{U}$ and $σ \colon \mathbb{U} → \mathbb{T}$ are equivalent $\mathbb{T} ≅ \mathbb{U}$ if there exist two families of type isomorphisms $τ\sigma A ≅ A$ and $σ\tau B ≅ B$ parametrized over the types $A$ and $B$.

Syntactic categories and internal languages

Syntactic category

Internal language

Equivalence between CCC and STLC

Semantics

Thus, we can say that the simply-typed lambda calculus is the internal language of cartesian closed categories; each lambda theory represents a cartesian closed category.

Working in cartesian closed categories

Diagonal arguments

<<section-diagonalarguments>>

We can now talk internally about cartesian closed categories using lambda calculus. Note each closed λ term $\vdash a : A$ can also be seen as a morphism from the terminal object $1 → A$. We use this language to provide a simple proof to a known theorem for cartesian closed categories by W. Lawvere (see cite:lawvere06 for details). The theorem will imply many known corollaries as particular cases when interpreted in different contexts.

Lawvere’s fixed point theorem

Consequences

We want to note here how abstracting a seemingly banal theorem has resulted in a myriad of deep results. Moreover, Lawvere’s original theorem can be replicated without exponentials in any category with all products, taking adjoints in $d : A × A → B$. The condition of $d$ being point-surjective can also be weakened; we only need, for every $g \colon A → B$, the existence of some $x : A$ such that $d\ x\ a ≡ g\ a$ for all $a : A$.

Bicartesian closed categories

The rules of union and void types in simply-typed lambda calculus can be rewritten to match the structure of these adjoints. \begin{prooftree} \AXC{$$} \UIC{$Γ, z:0 \vdash \mathrm{abort} : C$} \AXC{$Γ, a:A \vdash c : C$} \AXC{$Γ, b:B \vdash c’ : C$} \BIC{$Γ, u : A+B \vdash \mathrm{case}\ u\ \mathrm{of}\ c;c’ : C$} \noLine \BIC{} \end{prooftree} In a similar way to how our previous fragment of simply-typed lambda calculus was the internal language of cartesian closed categories, the extended version of lambda calculus defined in Section sec-extendingstlc is the internal language of bicartesian closed categories.

Inductive types

The existence of an initial algebra in a cartesian closed category is equivalent to the existence a type with an elimination rule representing an induction principle. Types generated by an inductive rule are called inductive types and are common constructs in functional programming languages. Inductive types and their properties can be encoded as initial algebras satisfying a certain universal property. In particular, we will see examples of inductive types and how all of them can be modeled as initial algebras over a certain class of functors.

Binary trees

Semantics of System T

Polynomial endofunctors and well-founded trees

Locally cartesian closed categories and dependent types

<<sec-locallydependent>>

Quantifiers and subsets

Logical formulas as subsets

The motivation for this section is the interpretation of logical formulas as subsets. Every predicate $P$ on a set $A$ can be seen as a subset $\left\{ a ∈ A \mid P(a) \right\}\hookrightarrow A$ determined by the inclusion monomorphism. Under this interpretation, logical implication between two propositions, $P → Q$, can be seen as a morphism that commutes with the two inclusions; that is, \[\begin{tikzcd}[column sep=tiny] \left\{ a∈ A\mid P(a) \right\} \ar{rr}\drar[hook] & & \left\{ a∈ A\mid Q(a) \right\}\dlar[hook] \ & A & \end{tikzcd}\] $P$ implies $Q$ if and only if each $a ∈ A$ such that $P(a)$ is also in the subset of elements such that $Q(a)$. Note how we are working in a subcategory of the slice category $\Set/A$.

Substitution

Given a function $f \colon A → B$, any property in $B$ induces a property on $A$ via substitution in the relevant logical formula. This substitution can be encoded categorically as a pullback: the pullback of a proposition along a function is the proposition induced by substitution. \[\begin{tikzcd} \left\{a ∈ A \mid P(f(a)) \right\} \rar{} \dar[swap,hook]{} & \left\{ b ∈ B \mid P(b) \right\} \dar[hook]{}
A \rar{f} & B \end{tikzcd}\]

Weakening

A particular case of a substitution is logical weakening: a proposition on the set $A$ can be seen as a proposition on $A × B$ where we simply discard the $B$ component of a pair. \[\begin{tikzcd} \left\{(a,b) ∈ A × B \mid P(π(a,b)) \right\} \rar{} \dar[swap,hook]{} & \left\{ a ∈ A \mid P(a) \right\} \dar[hook]{}
A × B \rar{π} & A \end{tikzcd}\] Although it seems like an uninteresting particular case, once we formalize this operation as a functor, existential and universal quantifiers can be obtained as adjoints to weakening.

The pullback functor

Quantifiers as adjoints in sets

In $\Set$, we can find two adjoints to the particular case of the weakening functor $π \colon {\cal C}/A → {\cal C}/(A × B)$. These two adjoints are $∃ \dashv π \dashv ∀$ because

  • proving the implication $P(π(a,b)) → Q(a,b)$ for each pair $(a,b)$ amounts to prove that $P(a) → \left(∀ b ∈ B, Q(a,b)\right)$ for each $a$; \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&, row sep=tiny] \& A × B \&
    \left\{ (a,b) \mid P(a) \right\} \urar[hook] \ar{rr} \&\& \left\{ (a,b) \mid Q(a,b) \right\} \ular[hook] \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&,row sep=tiny] \left\{ a \mid P(a) \right\} \drar[hook] \ar{rr} \&\& \left\{ a \mid ∀ b∈ B, Q(a,b) \right\} \dlar[hook] \ \& A \& \end{tikzcd}} \end{prooftree}
  • and proving that $(∃ b∈ B, P(a,b)) → Q(a)$ for each $a$ is the same as proving that $P(a,b) → Q(π(a,b))$ for each pair $(a,b)$. \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&, row sep=tiny] \& A × B \&
    \left\{ (a,b) \mid P(a,b) \right\} \urar[hook] \ar{rr} \&\& \left\{ (a,b) \mid Q(a) \right\} \ular[hook] \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&,row sep=tiny] \left\{ a \mid ∃ b∈ B, P(a,b) \right\} \drar[hook] \ar{rr} \&\& \left\{ a \mid Q(a) \right\} \dlar[hook] \ \& A \& \end{tikzcd}} \end{prooftree}

Note how, in this case, we are considering adjunction diagrams in the slice category. A generalization of this idea to other categories will extend our categorical logic with quantifiers.

Locally cartesian closed categories

Left adjoint

Locally cartesian closed category

We define a locally cartesian closed category as a category with a terminal object and pullbacks ${\cal C}$ such that the pullback functor also has a right adjoint $Π_f \colon {\cal C}/A → {\cal C}/B$. The rationale for this name becomes apparent in the following characterization.

Characterization by slices

Presheaf categories are locally cartesian closed

We proved earlier that generalized sets, or presheaves, were cartesian closed (Proposition proposition-presheaves-ccc); we will now prove that they are actually locally cartesian closed.

Dependent types

Dependent types

In the same way that cartesian closed categories model the types of the simply typed lambda calculus, locally cartesian closed categories model dependent types that can depend on elements of another type. Each dependent type $B$ depending on values of type $A$ can be also seen as a family of types parameterized over the other type $\{B(a)\}a : A$. This extension of type theory forces us to revisit the notion of typing context.

Contexts

Typing contexts for dependent type theory are given as a list of variables \[ Γ = (a_1 : A_1, a_2 : A_2, …, a_n : A_n) \] where each type $A_i$ can depend on the variables $a_1,…,ai-1$. The core syntax of dependent type theory can be expressed in terms of substitutions between contexts. A substitution from a context $Δ$ to $Γ$ is written as $σ \colon Δ → Γ$, and is given by a list of terms $(t_1,…,t_n)$ such that \[ Δ \vdash t_1 : A_1,\quad Δ \vdash t_2 : A_2[t_1/a_1],\quad …,\quad Δ \vdash t_n : A_n[t_1,…,tn-1/a_1,…,an-1], \] that is, a context can be substituted into another if the list of terms of the second one can be built from the first one.

Categorical interpretation

The interpretation of a dependent type theory as a category takes contexts $Γ$ as objects $\intr{Γ}$ and substitutions as morphisms. Note how there exists an identity substitution $σ\mathrm{id} \colon Γ → Γ$ that simply lists the variables of the context and how any two substitutions $τ \colon Γ → Φ$ and $σ \colon Δ → Γ$ can be composed into $τ ˆ σ \colon Δ → Φ$, which creates the terms of $Γ$ from $Δ$ following $τ$ and uses again these terms to create the terms of $Φ$.

Display maps and projections

A particular kind of substitutions will be display maps. If a term can be constructed on a given context, $Γ \vdash a : A$, the context can be extended with that term to $Γ, a : A$. Display maps are substitutions of the form $π_A \colon \intr{Γ, a : A} → \intr{Γ}$ that simply list the variables of $Γ$ and discard $a : A$.

This way, each type $A$ in a context $Γ$ is represented by the object $π_A \colon \intr{Γ, a:A} → \intr{Γ}$ of the slice category ${\cal C}/\intr{Γ}$; and each term of the type, $Γ \vdash a : A$ is represented by a morphism from $\id \colon \intr{Γ} → \intr{Γ}$, which is the terminal object of $Γ/A$, as in the following diagram. \[\begin{tikzcd} \intr{Γ} \rar{a} \drar[equal] & \intr{Γ, a:A} \dar{π_A}
& \intr{Γ} \end{tikzcd}\]

Dependent pairs

The locally cartesian closed structure of a category induces new type constructors in the type theory: dependent pairs and dependent functions. Their meaning under the Curry-Howard interpretation is that of the existential and universal quantifiers, respectively.

Dependent pair types, or Σ-types, can be seen as a generalized version of product types. Given a family of types parameterized by another type, $\{B(a)\}a : A$, the elements of $a:AB(a)$ are pairs $\pair{a,b}$ with a first element $a : A$ and a second element $b : B(a)$; that is, the type of the second component depends on the first component. This type is often written as $Σ (a:A), B(a)$ and it corresponds to the intuitionistic existential quantifier under the propositions as types interpretation. That is, the proof of $∃ (a:A), B(a)$ must be seen as a pair given by an element $a$ and a proof of $B(a)$.

In a locally closed cartesian category, a type $B$ depending on $Γ, a:A$, can be written as $π_B \colon \intr{Γ, a:A, b:B} → \intr{Γ, a:A}$; then, the type $∑a:AB$ is given by the object \[ Σπ_Aπ_B \colon \intr{Γ, a:A, b:B} → \intr{Γ}. \] That is, the sigma type over a type is left adjoint to the weakening that determines that type; this left adjoint, as we proved in Proposition prop-leftadjointweak, is given by postcomposition with $π_A$. Thus, categorically, the type $∑a:A B(a)$ is given in the empty context by the composition of the projections that give rise to the type $A$ and to the type $B$ in the context of $A$. \[\begin{tikzcd} \intr{x : A, y : B} \rar{π_B} & \intr{x : A} \rar{π_A} & 1 \end{tikzcd}\] Thus, elements of this type can be built categorically with an element of $a : A$, using a pullback to create the context $\intr{B(a)}$ and then providing an element $b : B(a)$. \[\begin{tikzcd} \intr{Γ} \rar{b} \drar[equal] \ar[bend left]{rr}{\pair{a,b}} & \intr{Γ,y : B(a)} \dar\rar & \intr{Γ,x:A,y:B} \dar{π_B} \ar[bend left=70]{dd}{πΣ}
& \intr{Γ} \rar{a}\drar[equal] & \intr{Γ, x : A} \dar{π_A} \ && \intr{Γ} \end{tikzcd}\] This can be rewritten as the following introduction rule. \begin{prooftree} \AXC{$Γ \vdash a : A$} \AXC{$Γ \vdash b : B[a/x]$} \BIC{$Γ \vdash \pair{a,b} : ∑x:AB$} \end{prooftree}

The adjunction in the slice category can be then particularized in the following two cases, taking $δ \colon \intr{Γ,A} → \intr{Γ,A,A}$ to be the substitution that simply duplicates the $A$. \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \& \intr{Γ,A} \&
\intr{Γ,A,B} \ar{rr}{δ\circ πB} \urar[bend left] \&\& \intr{Γ,A,A} \ular[bend right] \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \intr{Γ, ∑a:AB} \ar{rr}{\mathrm{fst}} \drar[bend right] \&\& \intr{Γ,A} \dlar[bend left] \ \& \intr{Γ} \& \end{tikzcd}} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \& \intr{Γ,A} \& \ \intr{Γ, A, B} \ar{rr}{id} \urar[bend left] \&\& \intr{Γ, A, B} \ular[bend right] \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \intr{Γ, ∑a:AB} \ar{rr}{\mathrm{snd}} \drar[bend right] \&\& \intr{Γ, A, B} \dlar[bend left] \ \& \intr{Γ} \& \end{tikzcd}} \noLine \BIC{} \end{prooftree} These two equalities represent two elimination rules. \begin{prooftree} \AXC{$Γ \vdash m : ∑x:A C$} \UIC{$Γ \vdash \fst(m) : A$} \AXC{$Γ \vdash m : ∑x:A C$} \UIC{$Γ \vdash \snd(m) : C[\fst(m)/a]$} \noLine \BIC{} \end{prooftree} We have two beta rules $\fst\pair{a,b} ≡ a$ and $\snd\pair{a,b} ≡ b$, and a uniqueness rule $m ≡ \pair{\fst(m), \snd(m)}$.

Dependent functions

Dependent function types, or Π-types, can be seen as a generalized version of function types. Given a family of types parameterized by another type, $\{B(a)\}a : A$, the elements of $x:A B(x)$ are functions with the type $A$ as domain and a changing codomain $B(x)$, depending on the specific element $x$ to which the function is applied. This type is often written also as $Π(x:A), B(x)$ to resemble the universal quantifier; under the propositions as types interpretation, it would correspond to the proof that a proposition $B$ holds for any $x : A$, that is, $∀ (x : A), B(x)$.

In a locally closed cartesian category, given a type $A$ in a context $Γ$, as $π_A \colon \intr{Γ, a:A} → \intr{Γ}$; and $B$ a type depending the context $Γ, a:A$, as $π_B \colon \intr{Γ, a:A, b:B} → \intr{Γ, a:A}$, the type $∏a:AB$ is given by the object \[ Ππ_A πB \colon \intr{Γ, a:A, b:B} → \intr{Γ}. \] That is, the dependent function type over a type is right adjoint to the weakening that determines that type.

Thus, categorically, the type $∏a:AB(a)$ is given in the empty context as the adjoint $π_A \dashv Ππ_A$ of the morphism representing the type $B$. Elements of this type can be built applying the adjunction on the diagram of any term of type $B$ that assumes $a : A$ in the context. \[\begin{tikzcd} \intr{Γ, a : A} \rar{b} \drar[equal] & \intr{Γ, a : A, b : B} \dar{π_B} & \intr{Γ} \rar{(λ a.b)} \drar[equal] & \intr{Γ, z : ∏a:A B} \dar{π}
& \intr{Γ, a : A} & & \intr{Γ} \end{tikzcd}\]

That is, we have the following adjunction and counit in the slice category \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \& \intr{Γ,A} \&
\intr{Γ,A} \ar{rr}{b} \urar[bend left] \&\& \intr{Γ,A,B} \ular[bend right] \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \intr{Γ} \ar{rr}{(λ a.b)} \drar[bend right] \&\& \intr{Γ, ∏a:AB} \dlar[bend left] \ \& \intr{Γ} \& \end{tikzcd}} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \& \intr{Γ,A} \& \ \intr{Γ,A,∏a:AB} \ar{rr}{app} \urar[bend left] \&\& \intr{Γ,A,B} \ular[bend right] \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&,column sep=tiny,row sep=small] \intr{Γ, ∏a:AB} \ar{rr}{\mathrm{id}} \drar[bend right] \&\& \intr{Γ, ∏a:AB} \dlar[bend left] \ \& \intr{Γ} \& \end{tikzcd}} \noLine \BIC{} \end{prooftree} which can be rewritten as introduction and elimination rules. \begin{prooftree} \AXC{$Γ, a : A \vdash b : B$} \UIC{$Γ \vdash (λ a.b) : ∏a:AB$} \AXC{$Γ \vdash a : A$} \AXC{$Γ \vdash f : ∏a:AB$} \BIC{$Γ \vdash f\ a : B(a)$} \noLine \BIC{} \end{prooftree} We have the equalities $(λ a.b)\ a’ ≡ b[a’/a]$ and $(λ a.f\ a) ≡ f$ as beta and eta rules.

Working in locally cartesian closed categories

Examples of dependent types

<<sec-examples-dependent-types>>

Vectors

Theorem of choice

Surprisingly, the theorem we have just proved reads classically as

“If for all $a:A$ there exists a $b:B$ such that $R(a,b)$, then there exists a function $g : A → B$ such that for each $a:A$, $R(a,g(a))$.”

and this is a form of the axiom of choice. Have we just proved the axiom of choice? The key insight here is that $Σ$ must not be read as the classical “there exists”, but as a constructivist existential quantifier that demands not only to merely prove that something exists but to explicitly construct it. A more accurate read would be

“If for all $a:A$ we have a rule to explicitly construct a $b:B$ such that $R(a,b)$, then we can use that rule to define a function $g : A → B$ such that for each $a:A$, $R(a,g(a))$.”

and this trivial-sounding theorem is known as the theorem of choice. The moral of this example is that, when we work in type theory, we are working inside constructivist mathematics and the existential quantifier has a fundamentally different meaning. Later, we will see a technique that will allow us to recreate the classical existential quantifier.

Equality types

<<sec-equalitytypes>> Equality in our theory comes from an adjunction, as was first proposed in cite:lawvere70. The equality type between elements of type $A$ will be represented by the diagonal morphism $\vartriangle \colon A → A × A$ as an object of ${\cal C}/(A × A)$. It is thus a type parameterized over two elements $x,y:A$, written as $(x = y)$. An element of an equality type, $p : x = y$ must be read as a proof that $x$ and $y$ are equal.

In general we have that, for any two objects in a slice category, $f \colon B → A$ and $g\colon C → A$, morphisms $f → g$ correspond naturally to sections of the pullback $f(g) \colon fC → B$; as in the following diagram. \[\begin{array}{lll}\begin{tikzcd} fC \rar\dar & C\dar{g}
B\rar{f}\uar[dashed,bend left]{\widetilde k} & A \end{tikzcd} &\qquad& \begin{tikzcd}[column sep=tiny] B \drar[swap]{f}\ar[dashed]{rr}{k} && C\dlar{g} \ & A & \end{tikzcd}\end{array}\] Given any $k$ in the diagram above, we can construct $\widetilde k$ using the universal property of the pullback; conversely, the fact that $\widetilde k$ is a section gives us a $k$ by composition with $fC → C$.

In particular, let $π_C \colon C → A × A$ or be a family of types $C(x,y)$ parameterized by two elements $x,y:A$. We have that any morphism from the equality type to $C$ corresponds to a section to $\vartriangle\!\! C$, which is, by substitution, the family of types $C(x,x)$. \[\begin{array}{lll}\begin{tikzcd} \phantom{f}\vartriangle\!\! C \rar\dar & C\dar{π}
A\rar{\vartriangle}\uar[dashed,bend left]{\widetilde k} & A× A \end{tikzcd} &\qquad& \begin{tikzcd}[column sep=tiny] A \drar[swap]{\vartriangle}\ar[dashed]{rr}{k} && C\dlar{π} \ & A× A & \end{tikzcd}\end{array}\] A section $\widetilde k$ of this form is precisely a term $x :A \vdash c : C(x,x)$, while a map $k$ is a term $x:A,y:A,p:x=y \vdash c:C(x,y)$. Thus, we have the following elimination rule for equality types, called J-eliminator in type theory literature. See cite:shulman17 for details. \begin{prooftree} \AXC{$Γ \vdash a : A$} \noLine \UIC{$Γ, x : A \vdash c:C(x,x)$} \AXC{$Γ \vdash b : A$} \noLine \UIC{$Γ \vdash p : a = b$} \BIC{$Γ \vdash \J_C(c,p) : C(a,b)$} \end{prooftree} The rule informally says that, if we want to prove some property $C(a,b)$ for each $a,b : A$, and we have $a = b$, we only need to prove $C(x,x)$ for each $x:A$. Moreover, if we consider the unit of the adjunction, as shown in the following diagram, \[\begin{array}{lll}\begin{tikzcd} \phantom{f}\vartriangle\!\! A \rar\dar & A\dar{\vartriangle} \ A\rar{\vartriangle}\uar[dashed,bend left]{\refl} & A× A \end{tikzcd} &\qquad& \begin{tikzcd}[column sep=tiny] A \drar[swap]{\vartriangle}\ar[dashed]{rr}{\id} && C\dlar{\vartriangle} \ & A× A & \end{tikzcd}\end{array}\] we have a section $\refl \colon A → \vartriangle\!\!A$ that expresses reflexivity, $x = x$ for each $x:A$, and corresponds to the following introduction rule. \begin{prooftree} \AXC{$Γ \vdash a : A$} \UIC{$Γ \vdash \refl_a : a = a$} \end{prooftree} Introduction and elimination rules are related by $\J_C(c,\refl_a) ≡ c[x/a]$; that is, the J-eliminator can return $c$ in the reflexivity case.

We can still generalize the rule to allow $C$ to be a family of types also parameterized over $p$, of the form $C(x,y,p)$. \begin{prooftree} \AXC{$Γ \vdash a : A$} \noLine \UIC{$Γ, x : A \vdash c:C(x,x,\refl_x)$} \AXC{$Γ \vdash b : A$} \noLine \UIC{$Γ \vdash p : a = b$} \BIC{$Γ \vdash \J_C(c,p) : C(a,b,p)$} \end{prooftree} The corresponding computation rule would be $\J_C(c,\refl)≡ c[a/x]$.

Subobject classifier and propositions

<<sec-subobject-propositions>> A subobject classifier is an object $Ω$ with a monomorphism $\mathrm{true} \colon 1 → Ω$ such that, for every monomorphism $m : S → X$, there exists a unique $χ$ such that \[\begin{tikzcd} S\rar{} \dar[swap, hook]{m} & 1 \dar{\mathrm{true}}
X\rar[dashed]{χ} & Ω \end{tikzcd}\] is a pullback square.

Now, if we have a type given by a monomorphism, $\intr{Γ , x : P} → \intr{Γ}$, by the defining property of the subobject classifier, we have a unique characteristic morphism $P \colon \intr{Γ} → \intr{Ω}$, which can be read as $Γ \vdash P \colon Ω$, meaning that $Ω$ is a type whose elements are themselves types (see cite:shulman17). \[\begin{tikzcd} \intr{Γ , x : P}\rar{} \dar[swap, hook]{} & 1 \dar{\mathrm{true}}
\intr{Γ} \rar[dashed]{χ_P} & \intr{Ω} \end{tikzcd}\] A type $P$ determined by a monomorphism, by definition, must have any two of its elements are equal. Thus, the elements of $Ω$ are the types with at most one element; these types are usually called propositions in type theory. This can be expressed by the following rule. \begin{prooftree} \AXC{$Γ \vdash P : Ω$} \AXC{$Γ \vdash a : P$} \AXC{$Γ \vdash b : P$} \TIC{$Γ \vdash \mathrm{isProp}_P(a,b) : a = b$} \end{prooftree} Any two proofs of a proposition (in the sense of type theory) must be equal, and thus, propositions allow us to reintroduce the notion of proof irrelevance.

Propositional truncation

<<sec-proptrunc>> Propositions are types, but not all types are propositions; a type $A$ may have multiple distinct elements, witnessing different proofs of the same fact. We could, however, truncate a type into a proposition $\|A\|$ by postulating that any two of its proofs $p,q : \|A\|$ should be equal, $p = q$. In this case, to provide an element of $A$ would mean to explicitly construct a proof of $A$; whereas to provide an element of $\|A\|$ would mean to witness that $A$ can be proved, without constructing any proof.

For instance, there are four distinct elements of \[ ∑(n,m) : \mathbb{N × \mathbb{N}} (n + m = 3), \] each one of them providing an ordered pair of natural numbers that add up to $3$. In contrast, there is a unique element of \[ \left\|∑(n,m) : \mathbb{N × \mathbb{N}} (n + m = 3) \right\|, \] that simply witnesses the existence of some pair of naturals that add up to $3$, without explicitly pointing to it. In this sense, the truncated version resembles more the existential quantifier of classical logic; while the untruncated version demands the explicit construction of an example.

How should we represent truncations inside our theory? It can be proved (see cite:awodey04) that propositional truncation is the left adjoint to the inclusion of propositions into general types; and, if we assume the existence of this adjoint into the category we are working into, we can use propositional truncations. That is, if $P$ is a proposition and $A$ is an arbitrary type, we have the following adjunction \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] A \rar{}\& P \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] \|A\| \rar{}\& P \end{tikzcd}} \end{prooftree} which in practical terms means that, if we want to prove $\|A\| → P$, where $P$ is a proposition, it suffices to prove $A → P$, that is, to assume a particular proof for $A$. Note that this corresponds to usual mathematical practice, where having a proof of existence can be used to assume that we have explicitly constructed the element and to prove a different proposition using it, with the condition that we cannot refer later to the explicit element we used during the proof.

Topoi

<<section-topoi>>

Motivation

Topoi (singular topos) are category-theoretic models of constructive mathematics; we can reason in its internal logic and rebuild large parts of mathematics inside their structure. Each topos is thus an universe of mathematics with different axioms and interpretations cite:bauer2017five; for example,

  • inside the Hayland’s realizability topos, every function is computable and we can study Kleene’s realizability theory (see cite:vanoosten08);
  • the Dubuc topos provides a non paradoxical formalization of notion of “infinitesimal” used by Newton and Leibniz, and we can study synthetic differential geometry inside it (see cite:dubuc89);
  • inside the Johnstone’s topological topos, we can reason with topological spaces and continuous functions between them (see cite:johnstone79), and under certain hypothesis, we can assume that all functions we can build are automatically continuous (see cite:escardo15).

Topoi are defined as locally cartesian closed categories with a subobject classifier in which every finite limit exists. Usually, we will be interested in W-topoi, or topoi with a natural numbers object (in the sense of Example example-naturalnumbersobj) cite:leinster10.

The study of any of these theories is beyond the scope of this text. However, as we have been relating type theory to the internal language of locally closed cartesian categories with enough structure; we know that type theory (depending on what constructions we assume) will have models in categories like these. We only describe a particular case developed by W. Lawvere while trying to provide an axiomatization of the category of sets.

An Elementary Theory of the Category of Sets

Lawvere’s Elementary Theory of the Category of Sets cite:lawvere64 provides a foundation for mathematics based on category theory. It describes the category $\Set$ in an abstract setting using eight axioms; and the atomic, undefined notions of the theory are not memberships and sets, but morphisms and composition. In the original article some examples on how to do set theory inside the category are shown.

Using the notation we have developed so far, Lawvere’s axioms can be reduced to the following definition. A model of the Elementary Theory of the Category of Sets is

  • a topos, that is, a locally closed cartesian category with all finite limits and a subobject classifier,
  • which is /well-pointed/, meaning that any two morphisms $f,g \colon A → B$ are equal if and only if $f ˆ a = g ˆ a$ for each $a \colon 1 → A$; morphisms from the terminal object are called global elements, and this property can be thought as function extensionality;
  • which has a natural numbers object, in the sense of Example example-naturalnumbersobj;
  • which satisfies the Axiom of Choice, meaning that point-surjective morphisms have a section; in terms of our previous Example example-type-axiom-of-choice, we could translate this internally as

    \[ \left(∏(a:A) \left\|∑(b : B) f(b) = a \right\|\right) → \left\| ∑(g : A → B)(a:A) f(g(a)) = a \right\|, \]

    for any $f \colon B → A$.

Note that the category $\Set$, under the usual axioms, is a model of this theory. This can be seen, then, as an abstraction of set theory. As we will see later, the Axiom of Choice implies the Law of Excluded Middle, so we have finally recovered a classical foundation of mathematics from category theory.

Type theory

Martin-Löf type theory

<<sec-mltt>> In this chapter, we will exclusively work internally in dependent type theory and use it as a constructive foundation of mathematics. This will have the added benefit that every formal proof in the system will be a closed lambda term and checking that a proof is correct will amount to typechecking the term. Explicitly, the type theory we have been describing in the previous sections corresponds to Martin-Löf type theory cite:nordstrom90. Two libraries of formalized mathematics, corresponding to two different foundational theories, have been built, containing a verified version of all the theorems on this chapter. See Section sec-verified for links to the formalized proofs.

Programming in Martin-Löf type theory

Martin-Löf type theory and the internal language we have been describing so far can also be regarded as a programming language in which is possible to formally specify and prove theorems about the code itself. Formal proofs in this theory can be written in any language with a powerful enough type system; examples of these include

  • Agda, a programming language that implements a variant of Martin-Löf type theory;
  • Coq cite:coq04 was developed in 1984 in INRIA; it implements Calculus of Constructions and was used, for example, to check a proof of the Four Color Theorem;
  • Idris, a programming language implementing dependent types and using a slightly modified version of intensional equality types;
  • NuPRL cite:nuprl86, a proof assistant implementing Martin-Löf extensional Type Theory, which is different from the intensional theory in how it defines equality types;
  • Cubical cite:cohen16 and RedPRL cite:redprl provide experimental implementations of Cubical Type Theory, a different variant of type theory.

In this text, we will use Agda to write mathematics, taking ideas from cite:Conor17. We will check proofs in Martin-Löf type theory using its type system.

Translation between categories and types

Dependent products

We can translate the categorical structure inside Agda as follows. Dependent products exist naturally as dependent functions of the language.

\setlength{\columnseprule}{0.4pt} \begin{multicols}{2} \begin{prooftree} \AXC{$Γ, a : A \vdash b : B$} \UIC{$Γ \vdash (λ a.b) : ∏a:AB$} \end{prooftree} \vfill \columnbreak \vfill \centering\ExecuteMetaData[agda-mltt/latex/Snippets.tex]{mlttproduct} \vfill \end{multicols}

Dependent sums

Dependent sums must be explicitly specified in the form of records: data structures in which the type of every element can depend on the previous ones.

\setlength{\columnseprule}{0.4pt} \begin{multicols}{2} \vfill \begin{prooftree} \AXC{$Γ \vdash a : A$} \AXC{$Γ \vdash b : B[a/x]$} \BIC{$Γ \vdash \pair{a,b} : ∑x:AB$} \end{prooftree} \columnbreak \centering\ExecuteMetaData[agda-mltt/latex/Snippets.tex]{mlttsum} \end{multicols}

Naturals

Naturals, and initial algebras in general, can be defined as inductive types with the data keyword. Functions over these types can be constructed using their universal property, as we did in Example example-naturalnumbersobj.

\setlength{\columnseprule}{0.4pt} \begin{multicols}{2} \vfill \begin{tikzcd}[fragile] 1+\mathbb{N}\rar{} \dar[swap]{\pair{0,\mathrm{succ}}} & 1+hom(\mathbb{N},\mathbb{N}) \dar{\pair{\id, \mathrm{succ}\,ˆ\, -}}
\mathbb{N}\rar{+} & hom(\mathbb{N},\mathbb{N}) \end{tikzcd} \columnbreak \vfill \centering\ExecuteMetaData[agda-mltt/latex/Snippets.tex]{mlttnat} \end{multicols}

Equalities

Equality types are a particular case of an inductive family of types. The induction principle over equalities is the J-eliminator we described in Section sec-equalitytypes.

\setlength{\columnseprule}{0.4pt} \begin{multicols}{2} \vfill \begin{prooftree} \AXC{$Γ \vdash a : A$} \UIC{$Γ \vdash \refl_a : a = a$} \end{prooftree} \columnbreak \vfill \centering\ExecuteMetaData[agda-mltt/latex/Snippets.tex]{mltteq} \end{multicols}

Examples

Using all this machinery, we can prove facts about equality by induction and prove facts about the natural numbers, as in the following example, where we prove that $a = b$ implies $f(a) = f(b)$ and then use this fact to prove that $n + 0 = n$ for any $n : \mathbb{N}$ by induction.

Propositions

Finally, we can define propositions as those types where any two elements are equal and postulate that truncations are propositions. The induction principle for truncated types represents the adjunction we described in Section sec-proptrunc.

\setlength{\columnseprule}{0.4pt} \begin{multicols}{2} \vfill \begin{prooftree} \AXC{$Γ \vdash P : Ω$} \AXC{$Γ \vdash a : P$} \AXC{$Γ \vdash b : P$} \TIC{$Γ \vdash \mathrm{isProp}_P(a,b) : a = b$} \end{prooftree} \columnbreak \vfill \centering\ExecuteMetaData[agda-mltt/latex/Snippets.tex]{isprop} \end{multicols}

\setlength{\columnseprule}{0.4pt} \begin{multicols}{2} \vfill \begin{prooftree} \AXC{\begin{tikzcd}[fragile,ampersand replacement=\&] A \rar{}\& P \end{tikzcd}} \doubleLine \UIC{\begin{tikzcd}[fragile,ampersand replacement=\&] \|A\| \rar{}\& P \end{tikzcd}} \end{prooftree} \columnbreak \vfill \centering\ExecuteMetaData[agda-mltt/latex/Snippets.tex]{truncrec} \end{multicols}

Excluded middle and constructivism

Using categories, we have strengthen the propositional logic we described in Section sec-naturaldeduction to a fully-fledged higher order logic in which we can construct mathematical proofs. However, during all this process, we have not accepted the Law of Excluded Middle; we do not assume that $P ∨ ¬ P$ for an arbitrary proposition $P$. Note, however, that we do not negate it, neither: that would cause a contradiction, as we actually proved $¬\neg(P ∨ ¬ P)$ in general in Section section-proof-lem. We are agnostic regarding it. This was the subject of a philosophical dispute between Hilbert and Brouwer on whether nonconstructive methods were justified in foundations of mathematics cite:martinlof08.

This has deep effects on the mathematics we are working with. We are working at a great level of generality, where many results are just independent of our theory. Independence of statements like the following may be shocking to a classical mathematician: we cannot prove that the subset of a finite set is finite; we cannot prove that every real is equal or distinct from zero; we cannot prove that every ideal is contained in a maximal ideal; we cannot prove that every vector space has a basis (see cite:bauer2017five for more examples).

Why are we not simply accepting the excluded middle, as is common practice? The Law of Excluded middle would affect the computational properties of the theory. For instance, whenever we create a natural number in our theory, we expect it to be a numeral of the form $\mathsf{succ}(\mathsf{succ}(… \mathsf{zero} …))$ for some number of $\mathsf{succ}$ applications. Inductive definitions can only compute with natural numbers when they are on this form. However, if we were to postulate the Law of Excluded Middle, $\mathsf{LEM}\colon P ∨ ¬ P$, we should accept the existence of numbers such as \[ \mathsf{if}\ \mathsf{LEM}(\mathrm{RiemmanHypothesis})\ \mathsf{then}\ 0\ \mathsf{else}\ 1,\] but we would not be able to compute them into a numeral. We say that a type system has the *canonicity property* if every inhabitant of the $\mathbb{N}$ type is a numeral.

In any case, classical mathematics are a particular case of constructive mathematics. We can choose to work inside classical mathematics, although we would lose the computational content. When we choose not to, we are working in constructive mathematics with a programming language as it was first proposed by Errett Bishop cite:bishop67.

Extensionality and Diaconescu’s theorem

<<sec-diaconescuth>> If we want to recover classical mathematics, we should try to model the axioms of the Elementary Theory of Sets into our locally closed cartesian categories. We already have natural numbers in Martin-Löf type theory, and well-pointedness, for instance, is called function extensionality in type theory literature.

We postulate this in Agda as follows.

The last ingredient is the Axiom of Choice. This, like in the case of the Law of Excluded Middle, will make us lose the computational content of the theory. In fact, we can prove that, in general, the Axiom of Choice implies the Law of Excluded Middle. This is the statement of Diaconescu’s theorem, for which we provide both a natural proof and a computer-verified proof.

This proof has been implemented as follows. Note that we have only assumed the Axiom of Choice (and therefore, the Law of Excluded Middle) during this particular section of the text. In the next section, we return to constructive mathematics, working with the real numbers and extracting algorithms from proofs.

Dedekind reals

<<sec-dedekindreals>> In Reals.agda, we provide a formalized construction of the Dedekind positive reals in intensional Martin-Löf type theory, implemented in Agda. This implementation constructs reals as Dedekind cuts over the positive dyadic rationals $D$; that is, over the rationals of the form $a / 2^b$ for some $a,b : \mathbb{N}$. We also provide a library with all the necessary lemmas proving that our constructions are well-defined.

Natural numbers are constructed as initial algebras (as described in Example example-naturalnumbersobj). Dyadic rationals are constructed as pairs of naturals endowed with a normalization property: a fraction $a/2^b$ is normalized if $a$ is an odd number or if $b$ is exactly zero. \[ D = ∑(a,b) : \mathbb{N × \mathbb{N}} \left\|\mathrm{odd}(a) + \mathrm{isZero}(b) \right\| \] This technique ensures that each dyadic rational will be uniquely represented by a term of type $D$. Finally, a real number $r : \mathbb{R}^+$ is constructed as the sum of the following data.

  • A Dedekind cut, $\mathrm{cut}_r : D → Ω$, representing a proposition parameterized over the positive dyadic rationals. Given $q : D$, the proposition $\mathrm{cut}_r(q)$ is true if and only if $r &lt; q$.
  • A proof $\| ∑q : D \mathrm{cut}_r(q) \|$ witnessing that the Dedekind cut is inhabited, that is, there exists some $q$ such that $r &lt; q$, providing an upper bound on the real number.
  • Two functions that witness that the cut is round. That is, a dyadic $q : D$ is in the cut if and only if there is a smaller $p : D$ also in the cut. Symbolically, \small \[ \left( ∏q : D \mathrm{cut}_r(q) → \left\| ∑p : D (p < q) × \mathrm{cut}_r(p) \right\| \right) × \left( ∏q : D \left\| ∑p : D (p < q) × \mathrm{cut}_r(p) \right\| → \mathrm{cut}_r(q) \right). \] \normalsize

The following code shows the core definition of the Agda implementation.

Note that, under the intuitionistic interpretation, it is not the case in general that $∏q : D \mathrm{cut}_r(q) + ¬\, \mathrm{cut}_r(q)$ for an arbitrary $r : \mathbb{R}^+$; in fact, numbers that are neither equal nor distinct from zero could exist! as we mentioned earlier, a formalization of infinitesimals is possible using this property, see cite:bauer13. However, even if we cannot prove it in general, we can prove $\mathrm{cut}_r(q) + ¬\, \mathrm{cut}_r(q)$ for some particular real numbers. We call these numbers located, and, for these numbers, the computational nature of the proof provides an algorithm that produces an infinite stream with the digits of the number.

As a proof of concept, we define square roots, $\sqrt{-} : \mathbb{R}→ \mathbb{R}$, as \[ \mathrm{cut}\sqrt{r}(q) = \left\| ∑p : D ( \mathrm{cut}_r(p)) × \left(p < q^2\right) \right\|. \] We prove they are well-defined using several lemmas about the dyadic numbers and we prove that $\sqrt{2}$ is located. The Agda compiler is then able to produce the first binary digits of $\sqrt{2} = 1.01101010000…$. Note that this construction is thus being formalized in any locally cartesian closed category with enough structure. Explicitly, we use the following Agda code; it calls to already defined lemmas on the natural numbers when necessary and it shows how existential elimination is written inside the language to prove the necessary properties of the Dedekind cut.

Dedekind cuts are not the only path we can take in order to define the real numbers in type theory. Reals can be also defined in terms of Cauchy sequences, but the resulting construction is not in general equivalent to ours. Only when we assume excluded middle, both Dedekind and Cauchy reals become equivalent and the classical notion of real number is recovered. A detailed discussion can be found in the last chapters of cite:hottbook.

Homotopy type theory

<<sec-hott>>

And it soon became clear that the only long-term solution was somehow to make it possible for me to use computers to verify my abstract, logical, and mathematical constructions. When I first started to explore the possibility, computer proof verification was almost a forbidden subject among mathematicians. The primary challenge that needed to be addressed was that the foundations of mathematics were unprepared for the requirements of the task.

Vladimir Voevodsky, cite:voevodsky14.

Homotopy type theory I: Equality

We have already discussed how, given any $x,y:A$, we can interpret the equality type $(x = y)$, whose terms are witnesses of the equality. For each $x : A$, there is a reflexivity element, $\refl : x = x$; and the J-eliminator is interpreted as the following induction principle over the type. \[ ∏(C : ∏_{(x,y : A)(x=y) → {\cal U})} \left( \left( ∏a:A C(a,a,\mathsf{refl}) \right) → ∏(x,y : A)(p:x=y) C(x,y,p) \right). \] This induction principle allows us to prove symmetry, transitivity and other properties of equality (we explicitly do so in Base.agda and Equality.agda). However, it is not possible to prove by path induction that every path is equal to the reflexivity path, that is, \[ ∏(x : A)(p : x = x) (p = \refl), \] is not derivable from the induction principle. This is equivalent to the principle of uniqueness of identity proofs, it is also equivalent to Streicher’s Axiom K cite:streicher93 and it is independent from Martin-Löf type theory.

If we do not assume this axiom, we open the possibility to the existence of multiple different proofs of the same equality. The structure of these proofs, endowed with symmetry and transitivity, could be modeled into a groupoid; and this idea allows us to construct models of type theory where the principle of uniqueness of identity proofs does not hold (see cite:hofmann98). If we also consider equalities between proofs of equality, and equalities between proofs of equalities between equalities, and so on, we would get a weak ω-groupoid structure cite:berg11. Following the Grothendieck’s Homotopy Hypothesis, groupoids can be regarded as an homotopical spaces, where equalities are paths and equalities between equalities are homotopies between paths (some work on this hypothesis can be read in cite:tamsamani96).

In any case, as the existence of this non-trivial structure is independent of the theory, we need to introduce new axioms or new types with nontrivial equalities if we want to profit from this interpretation. The introduction of Voevodsky’s Univalence Axiom leads to Homotopy Type theory, an extension of Martin-Löf type theory where we can work with this groupoid structure. Under the identification of higher groupoids and homotopical types, the new axiom allows us to reason in some sort of synthetic homotopy theory, where paths and homotopies are primitive notions. For instance, we can define the fundamental group (fundamental groupoid, if we also want to consider higher structure) of a type $A$ in a point $a : A$ as the type of loops $π_1(A,a) :≡ a = a$, endowed with reflexivity and transitivity. The circle can be defined as the freely generated type with a single element $\mathsf{p} : \mathbb{S}^1$ and a nontrivial equality $\mathsf{loop} : \mathsf{p} = \mathsf{p}$. Because it is freely generated, we can apply symmetry to get a different proof of $\mathsf{p} = \mathsf{p}$ which we will call $\mathsf{loop}-1$; moreover, we can apply transitivity $n$ times to $\mathsf{loop}$ for an arbitrary $n$ to get a new proof of $\mathsf{p} = \mathsf{p}$ which we will call $\mathsf{loop}^n$; these are the elements of its fundamental group $π_1(\mathbb{S}^1, \mathsf{p})$.

In this setting, results such as the Van Kampen theorem or the construction of Eilenberg-MacLane spaces have been formalized (see cite:hottbook).

Homotopy type theory II: Univalence

We say that there is an equivalence between two types $A$ and $B$ and we write it as $(A ≅ B)$ if there is a biinvertible map between them. Explicitly, \small \[ (A ≅ B) = ∑f : A → B \left( \left( ∑g : B → Aa : A g(f(a)) = a \right) × \left( ∑g : B → Ab : B f(g(b)) = b \right) \right). \] \normalsize It can be shown that, for any pair of types $A$ and $B$, there exists a function of type $\mathrm{idtoeqv} : (A =\cal U B) → (A ≅ B)$ that uses identity functions to construct an equivalence from an equality. The Univalence axiom states that this function is itself an equivalence.

In practical terms, this implies that isomorphic structures can be identified. For example, we could consider the type of the integers $\mathbb{Z}$ with the successor and predecessor functions and create an equivalence $\mathbb{Z} ≅ \mathbb{Z}$ which can be turned into a nontrivial equality $\mathbb{Z} = \mathbb{Z}$ via the Univalence axiom, representing that integers, as a type, are equivalent to themselves after shifting them by the successor function. In the file FundGroupCircle.agda, we use this fact to prove, inside type theory, that $\mathbb{Z} = π_1(\mathbb{S}^1)$, (following cite:hottbook and cite:licata13). This result is a proof, inside this synthetic homotopical setting, of the fact that the fundamental group of the circle is $\mathbb{Z}$.

Note on the proof

We could think that this only proves that Z is the free group over a single generator. After all, this is what the circle is, when interpreted as a groupoid.

I think that the main problem with that view is that the groupoid structure is not present inside the theory, it is only visible from outside. When we prove this we need univalence to internalize the structure of groupoid inside the theory, using Z as a bridge.

Verified formal proofs

<<sec-verified>> All definitions and theorems from Sections sec-mltt and sec-hott have been formalized into two Agda libraries whose complete code is presented in navigable HTML format at the following links. Note that each function name is a link to its definition.

On the Agda-mltt library, we define the basic types for Martin-Löf type theory. We have chosen to add propositions and propositional truncations, as described in sec-subobject-propositions, because of their expressive power, even if they are not part of the usual presentations of Martin-Löf type theory. In particular, they allow us to build the type of Dedekind positive reals (see Section sec-dedekindreals). We also prove Diaconescu’s theorem and show how a classical setting could be recovered (see Section sec-diaconescuth). More than a hundred propositions about the natural numbers and the dyadic rationals have been formalized in order to construct the real numbers, each one being a function in dependently typed lambda calculus.

This first library assumes uniqueness of identity proofs (Axiom axiom-uip), which eases some proofs but which is incompatible with Univalence (Axiom axiom-univalence). This is why we will need a second library in order to work with the Univalence axiom; this second library will not assume uniqueness of identity proofs. We also assume well-pointedness in the form of function extensionality (Axiom axiom-funext) and, only when working with Diaconescu’s theorem (Theorem th-diaconescu), we assume the Axiom of Choice.

On the Agda-hott library, we define again basic types for Martin-Löf type theory, we again assume function extensionality (Axiom axiom-funext), but this time we also assume Univalence Axiom. A more detailed study of the notion of equality and equivalence follows, defining multiple equivalent notions of equivalence. We again define truncations and relations, the natural numbers, the integers and some algebraic structures. All this machinery is used while constructing the higher-inductive type of the circle to prove that its fundamental group is $\mathbb{Z}$.

These two libraries are original developments. They both follow ideas mainly from the Homotopy Theory book cite:hottbook, but also from the Univalent Foundations library written with the Coq proof assistant cite:unimath and the Coq homotopy theory library cite:bauer17. The definition of the type of propositions follows cite:escardoagda. Our second library compares to the highly more complete and more technical cite:hott-in:agda by presenting only the small subset of type theory necessary to compute the fundamental group of the circle and thus simplifying the whole structure of the library.

Conclusions and further work

Categories are an exceptionally powerful and natural language to talk about mathematics. I feel that the knowledge about logic and foundations of mathematics I have gained while writing this text has been amplified by the fact that categories have always provided a confortable framework to work with logic while avoiding delving excessively in more philosophically-related concerns. In addition, and perhaps surprisingly, categories create very reasonable programming languages! While this may be a bit oversimplifying, this is the practical lesson we can learn from both Mikrokosmos and Agda. When I started to write this text, I was aware of some form of unspecified connection between logic, programming and categories, and I wanted to make precise these statements on my mind. In any case, I was not expecting to find such a fruitful and deeply interesting topic of study right at the intersection between mathematics and computation.

Categories must be taken seriously cite:lawvere1986taking. Many examples of this text refer to basic notions of mathematics seen from the perspective of category theory. Rather than seeing them as trivial identifications, it is sometimes illuminating to take this perspective. Induction, natural numbers and more complex initial algebras are an example of an idea that gets considerably clear when we speak in the internal language of a category.

Mikrokosmos was born as a didactic project to teach lambda calculus. While functional programming languages are based on lambda calculus, its essence gets blurred between the many constructions that are needed for a modern programming language to be of any utility, even when they have not a strong theoretical basis. Many lambda calculus interpreters only aim to provide useful languages that show that programming in lambda calculus can be done, but they also sometimes blend together many different aspects and it can be difficult for the user to determine if a particular construction can be actually translated back to categories. We could say that we have succeeded in the task of providing a clear interpreter teaching how to translate between lambda calculus and logic. I have had the opportunity to conduct successful seminars on the lambda calculus using Mikrokosmos as a tool and the interpreter (only in its command-line version) has been downloaded more than a thousand times from the Hackage platform, exceeding my initial expectations and confirming there was a need for this kind of purely didactic experiment.

While proving on the order of a hundred lemmas about the natural numbers only with the help of the Agda type checker, the need for automation tools quickly arises. They are available in the form of tactic languages in more specialized proof assistants (see for instance, LTac in Coq, cite:delahaye00); but it would be cleaner to use the same language both to prove and to automate. Reflection is an example of a technique that would help to leverage the power of Agda as a programming language to automate Agda as a proof checker; an example of its usage is described in cite:vanderwalt12. Maybe, using these techniques, proving theorems on real analysis encoding more complex theories such as synthetic differential geometry (cite:kock06) would be possible in Agda.

In this text, we have discussed how intuitionistic logic provides a useful generalization of classical logic with semantics in cartesian closed categories and a computational interpretation on simply-typed lambda calculus. It seems plausible that a similar discussion on monoidal categories and linear logic could have been made. Linear logic can be interpreted as a logic of resources where we can make a distinction between, for example, two types of conjunctions: $A ⊗ B$ means to have both $A$ and $B$, while $A\&amp; B$ means to have one or the other at our choice (see cite:lafont95). Monoidal categories provide semantics for this logic (see cite:mellies09), we have many monoidal graphical languages that would be worth exploring (see cite:selinger10), constructive mathematics can be done with linear logic and it has an interpretation on quantum computing (see cite:schreiber14 or cite:lago12). A different path would be to follow the study of $∞\mathrm{-categories}$ and $∞\mathrm{-groupoids}$ as more expressive models of type theory (see cite:riehl2017type).

It is conceivable that, in the future, mathematicians will be able to write and check their proofs directly into a computer as part of their routine. Reaching that stage will require a massive effort into the study of foundations and implementation of languages, coupled with the creation of tools and software easing the task for the working mathematician.

Appendices

Code

The complete code that generates this text, along with the code here presented can be found at https://github.com/mroman42/ctlc.

The Mikrokosmos lambda interpreter has its documented code published under a GNU General Public License v3.0 at https://github.com/mroman42/mikrokosmos. Code for a previous stable version on the Hackage platform can be found at https://hackage.haskell.org/package/mikrokosmos. An HTML version of the documented code can be accessed at https://mroman42.github.io/mikrokosmos/haddock/.

Mikrokosmos has been released along with the following references.

The code for the Agda-mltt and Agda-hott libraries can be downloaded from the main repository. The easily-navigable HTML versions can be found at

Acknowledgments

\small The opportunity of devoting my bachelor’s thesis to this fascinating subject has been made possible by Professor Pedro García-Sánchez and Professor Manuel Bullejos. They have provided me with corrections and useful guidelines for the text and they have gone beyond their obligation in their efforts to instruct me on how to expose these ideas clearly. Any deviation from this goal must be attributed to my own inexperience.

I would like to thank the LibreIM community in general and Ignacio Cordón, David Charte, Marta Andrés, José Carlos Entrena, Pablo Baeyens, Antonio Checa, Daniel Pozo, and Sofía Almeida in particular, for testing the interpreter and providing useful discussion and questions on the topics of this text. Professor Luis Merino, Professor Juan Julián Merelo and Braulio Valdivielso have made possible and participated on the organization of meetings and workshops on these ideas.

Finally, I would like to express my gratitude to Benedikt Ahrens and the organizers of the School and Workshop on Univalent Mathematics at the University of Birmingham, whose effort has given me the opportunity to learn the rudiments of type theory and the Univalent Foundations program.

This document has been written with Emacs26 and org-mode 9, using the org file format and LaTeX as intermediate format. The document takes some configurations from the classicthesis template by André Miede and modifications by Adrián Ranea, Alejandro García and David Charte. The minted package has been used for code listings and the tikzcd package has been used for commutative diagrams. The document is released under a Creative Commons BY-SA 3.0 license, while the source code can be redistributed under the terms of the GNU General Public License. \normalsize

Bibliography

bibliographystyle:alpha bibliography:Bibliography.bib