Type Theory (Laurea Magistrale in Matematica)

Introduction

This is a brief and very informal presentation of the course.

Type theory is a discipline which straddles both mathematics and computing science. In mathematics, type theory has strong roots within mathematical logic, while recent research (since 2009) found important links with geometry. In computing science, type theory had (and still has!) a huge influence on the design of programming languages.

This course mainly focuses on the theoretical / mathematical aspects of types, especially in functional programming. Some practical activity is also planned. The only strong prerequisite is a general mathematical mindset (abstract reasoning skills). See ESSE3 for the complete syllabus.

Some Informal Intuition about Types

To start, we can ask ourselves if $$5 = \langle 1,2 \rangle$$ holds. The question may appear strange at first, and without a more precise context, it could be answered in two ways.

One might answer that such equality does not hold: $$5$$ is a number, and not a pair. Indeed, according to the de facto standard set theory axioms, it does not hold. However, using the same set theory, we might find out that the equation $$5 = \{ 0,1,2,3,4 \}$$ instead does hold, since $$5$$ is defined in precisely that way. So, $$5$$ is not a pair but it is a set, which might be as puzzling. In practice, one should regard this fact as an incident, an artifact of the "low-level" definition of $$5$$. As such, after having proved the fundamental properties of naturals, mathematical proofs should avoid exploiting such low-level artifacts, relying instead on the higher-level algebraic structure of $$\mathbb{N}$$.

There is however another answer to $$5 = \langle 1,2 \rangle$$. One could say that this equation is not just false, it is meaningless. We could in fact restrict ourselves to study only those equations of the form $$x = y$$ where $$x,y$$ are objects of the same "nature". Precisely formalizing the requirement is not trivial. One might be tempted to require that $$x,y \in A$$ for the same set $$A$$, but unfortunately set theory allows us to construct the set $$A = \mathbb{N} \cup \mathbb{N}^2$$, which contains both numbers and pairs. We want a different notion: we want to require that $$x,y$$ have the same type.

In programming languages, we sometimes face similar issues. Consider the following (quite contrived) pseudo-code:

1) a,b = factorize(93274875782361);
2) print("6" + 7);
3) print(a);

In line 1, assume that factorization takes a long time to execute, producing two non trivial factors when possible. In line 3, one of such factor is printed. The real question is: what happens in line 2?

Line 2 adds a string and a number. Real world programming languages widely depart on the semantics of "6" + 7. Below, we discuss the behaviour of several languages.

Some of them (Pyhton, Ruby) will execute line 1, and after the long computation, report a type error in line 2. Here, the program is aborted before the program prints the result in line 3. Others (Java, Scala, C#, JavaScript/ECMAScript) will instead automatically convert the number 7 to a string, and concatenate the strings, printing 67 in line 2. Others (Perl5, PHP) will instead automatically convert the string "6" to a number, and add the numbers, printing 13 in line 2. Others (C, C++) will instead automatically convert the string "6" to a pointer-to-char, add 7 to the pointer. Roughly, this would have had the effect of discarding the first 7 characters of the string, would it have been longer. Since this can not be done, the behaviour of the program is undefined. According to the ISO standards, executing such a program can do absolutely anything, with no restrictions whatsoever. In most implementations, the program crashes, or prints some random characters. Others (Rust,Pascal,Ocaml,Haskell) instead detect the type error before the program even starts to run. This is done by exploiting a set of typing rules which check the syntax of the program (without running it). When this check succeeds, we know that the program will never perform at run-time any ill-typed operation. A famous slogan here is "typeable programs can not go wrong".

When types are checked during the program execution, we speak about dynamic type checking. When they are checked before the program starts, we speak about static type checking. Type theory focuses on the static aspects of types, including how the typing rules should be designed. Given a set of such rules, a few important problems arise: type checking (is a given expression of the given type?), type inference (find a type for a given expression), term inference (find an expression for the given type). These problems may be trivial when the allowed types are trivial, as in the basic example above (where we only have strings and integers), but quickly become more complex once we allow more complex types (see below for a few examples).

A final remark: the focus of the course is not the in-depth study of all the languages above. Indeed, real-world languages have many, many features and examining them would take a very long time. Still, when discussing the theory underlying types, we will mention a few specific type-related features in such languages, also involving some practice.

A Few Glimpses inside the Course

Below, we provide some random formula, to hopefully whet the appetite of prospective students. These are not really explained here (that's what the course is for!), but having a peek at them might still be interesting.

Selected Types

$\begin{array}{ll} \mathbb{B} & \mbox{booleans} \\ \mathbb{N} & \mbox{natural numbers} \\ \mathbb{S} & \mbox{strings} \\ \ldots \\ 0 & \mbox{empty} \\ 1 & \mbox{unit (has a single element)} \\ A + B & \mbox{sum, or coproduct, or disjoint union}\\ A \times B & \mbox{product, or cartesian product}\\ B^A \quad\mbox{or}\quad (A \rightarrow B) & \mbox{exponential, or function space}\\ \sum_{x:A} B(x) & \mbox{dependent sum} \\ \prod_{x:A} B(x) & \mbox{dependent product} \\ \end{array}$

Some "Algebraic" Isomorphisms between Types

$\begin{array}{l} \mathbb{B} \simeq 1 + 1 \\ A \times B \simeq B \times A \\ A \times (B + C) \simeq A \times B + A \times C \\ A^B \times A^C \simeq A^{B \times C} \\ A \times A \simeq A^{\mathbb{B}}\\ \sum_{x:A} B \simeq A \times B \\ \prod_{x:A} B \simeq B^A = (A \rightarrow B) \\ \end{array}$

Selected Typing Rules

Read $$\Gamma \vdash e:T$$ as "assuming $$\Gamma$$, the expression $$e$$ has type $$T$$". $\begin{array}{ll} \dfrac{ \Gamma \vdash a : A \qquad \Gamma \vdash b : B }{ \Gamma \vdash \langle a, b \rangle : A \times B } & \dfrac{ \Gamma \vdash e : A \times B }{ \Gamma \vdash \pi_1(e) : A } \\[5mm] \dfrac{ \Gamma, x : A \vdash e : B }{ \Gamma \vdash (\lambda x : A.\ e) : A \rightarrow B } & \dfrac{ \Gamma \vdash f : A \rightarrow B \qquad \Gamma \vdash a : A }{ \Gamma \vdash (f\; a) : B } \\ \end{array}$

Relationships with Logic

Given the type $$A \times B \rightarrow A$$, we can construct a program having that type (the program which projects the first component). Given the logical formula $$A \land B \rightarrow A$$, we can construct a proof for it (it's a theorem).

Given the type $$A \times B \rightarrow C$$, we can NOT construct a program having that type (how to construct a value in $$C$$?). Given the logical formula $$A \land B \rightarrow C$$, we can NOT construct a proof for it.

Given the type $$(A \times (A \rightarrow B)) \rightarrow B$$, we can construct a program having that type (given $$\langle a,f \rangle$$, return $$f(a)$$). Given the logical formula $$(A \land (A \rightarrow B)) \rightarrow B$$, we can construct a proof for it (it's a theorem).

This is not a coincidence, it is an isomorphism!

Recursion and Types

Recursion is a common tool when writing programs. For instance,

factorial n = if n == 0 then 1 else n * factorial (n - 1)

is a recursive factorial program, which follows the mathematical definition. Even if it is less common, recursion can also be used on types. Below, we can recursively define the types of naturals, of "lists of A", of "binary trees of A" (each node has two children), and of "infinitary trees of A" (each node has infinitely many children). $\begin{array}{l} \mathbb{N} \simeq 1 + \mathbb{N} \\ ListA \simeq 1 + (A \times ListA) \\ BinTreeA \simeq 1 + (A \times BinTreeA \times BinTreeA) \\ InfTreeA \simeq 1 + (A \times (\mathbb{N} \rightarrow InfTreeA)) \\ \end{array}$ Each recursive type has an associated induction principle.

Polymorphic Types

In a language without (static) types, we can easily define a function for projecting out the second component of a pair: $f(x,y) = y$ What about in typed languages? There, we need to associate a type to such $f$. If the language has only a basic, primitive type system (as in C), we might have to resort to a family of functions: $\begin{array}{l} f_1 \ : \ \mathbb{N} \times \mathbb{B} \rightarrow \mathbb{B} \\ f_1 (x,y) = y \\ f_2 \ : \ \mathbb{N}^2 \times \mathbb{N} \rightarrow \mathbb{N} \\ f_2 (x,y) = y \\ f_3 \ : \ \mathbb{S} \times \mathbb{N}^3 \rightarrow \mathbb{N}^3 \\ f_3 (x,y) = y \\ \ldots \end{array}$ This is pretty inconvenient: we need to replicate the function on all the type combinations we are interested in. Further, if the set of types is infinite, this will only cover a finite amount of cases. A richer type system can instead allow for polymorphic types, allowing universal quantification over type variables $$A,B$$: $\begin{array}{l} f \ : \ \forall A,B.\ A \times B \rightarrow B \\ f (x,y) = y \\ \end{array}$ Alternatively, two "type-valued" arguments can be explicitly passed, making the function a "dependent function": $\begin{array}{l} f \ : \ \prod_{A,B:{\sf Type}} A \times B \rightarrow B \\ f (A,B,x,y) = y \\ \end{array}$

Parametricity

We have seen above that, given the type $$\forall A,B.\ A \times B \rightarrow B$$ we can construct a program having that type: the program which projects the second component. Is there any non-equivalent program shring the same type? One could propose a polymorphic function like $f(a,b) = \begin{cases} a+1 & \mbox{if } A = \mathbb{N} \\ a & \mbox{otherwise} \end{cases}$ The above is called "ad hoc polymorphism". What if we instead forbid run-time type equality checks such as $$A = \mathbb{N}$$, so that we get a "parametric" form of polymorphism? Is there any other program?

Is there any program of type $$\forall A.\ A \rightarrow A$$ except the identity program ?

Let $$[A]$$ stand for the type "finite lists of elements of $$A$$". Is it true that for any program of type $$\forall A.\ [A] \rightarrow [A]$$, when run in the case $$A = \mathbb{N}$$, doubling the numbers in the input list causes the numbers in the output list to be doubled?

The answer comes at a very low price.

Devil's Advocate: Has type theory really influenced real-world industry-relevant programming languages?

Claim: Functional programming and the lambda calculus are only theoretical toys, with no practical relevance.

Counter-argument: A few decades ago, the claim might have been true. However, now lambda-abstractions found their way in many imperative languages: Java 8, Scala, C#, C++, JavaScript/ECMAScript, Python, Rust, ... Of course functional languages always had lambdas: Scheme, Lisp, Ocaml, Haskell, ... Many industries now search for functional programming skills, and even influential recruiters strongly recommend functional programming to be taught in courses.

Claim: Sophisticated type systems are purely theoretical.

Counter-argument: The type system for the Girard-Reynolds polymorphic lambda calculus is the basis for the so-called "generics", which are now found in Java, Scala, C#, Rust, ... (and Haskell, Ocaml). The extended System $$F\omega$$ type system is the basis for the "higher kinds" of Scala and Haskell. Dependent types (as e.g. in Martin-Löf Intuitionistic Type Theory, or Coquand Calculus of Constructions) are more complex and not widely used, yet, beyond research languages (Coq, Agda). Some simpler types, inspired from dependent types, have begun to appear (e.g. GADTs in Haskell).