CakeML

About

CakeML is a functional programming language and an ecosystem of proofs and tools built around the language. The ecosystem includes a proven-correct compiler that can bootstrap itself.

The CakeML project consists of the following components, all of which are free software.

Language definition. The CakeML language is based on a substantial subset of Standard ML. Its formal semantics is specified in higher-order logic (HOL) in a functional big-step style. The core of the language (its syntax and semantics) is quite stable, but the standard basis library is still undergoing development. Contributions are welcome!

Compiler backend. The CakeML compiler has many parts. The most significant part is the verified compiler backend, which transforms an untyped AST to concrete machine code for one of 6 target architectures. The compiler backend has been proved to only produce machine code that is compatible with the behaviours of the source programs. The backend passes through several intermediate languages (as this diagram illustrates) and performs some optimisations.

Compiler frontend 1. There are two frontends to the compiler. The first one is a proof-producing synthesis tool (called the translator). It generates CakeML AST from ML-like functions in HOL and proves that the generated AST has the same behaviour as the HOL function. The original version of this tool produced only pure CakeML code, but more recent versions can produce code that performs I/O and uses state, including local state.

Compiler frontend 2. The second compiler frontend consists of a traditional parser followed by a type inferencer. Both of these have been proved sound and complete with respect to declarative specifications. For the parser, this means that our PEG parser implementation finds a correct parse tree if there exists one according to a traditional grammar for CakeML concrete syntax (SML). Soundness and completeness of the type inferencer means that, if the program can be typed, then the inferencer will find a type (which is the most general type).

Compiler bootstrapping. The CakeML compiler has been bootstrapped inside HOL. By bootstrapped we mean that the compiler has compiled itself. This was achieved by noticing that frontend 2 combined with the backend is a HOL function which we can feed into the tool-chain consisting of frontend 1 and the backend. The result is a verified binary that provably implements the compiler itself (with frontend 2). Recent built bootstrapped binary are here as tar.gz files. The bootstrapping is described here and here.

Post-hoc verification of CakeML programs. We have adapted Charguéraud's CFML verification framework to CakeML. Usually, we recommend that verified CakeML code is produced via synthesis using frontend 1. However, in some cases it is more convenient to do Hoare-style reasoning in the separation logic of CFML. CakeML's version of CFML supports reasoning about references, arrays, exceptions and I/O, and is used for verification of parts of the CakeML basis library.

Verified applications built using CakeML. The CakeML tools are geared towards production of verified applications using proof-producing synthesis (frontend 1) and compilation inside HOL (in-logic evaluation of the compiler backend). To date, the largest case study is the bootstrapped CakeML compiler. Other end-to-end verified applications that have been produced using the CakeML tools are:

a word frequency counter (a tutorial example)
Unix-like tools such as grep, sort, cat, diff, and patch
an OpenTheory article checker
a certificate checker for floating-point error bounds
checkers for SAT proofs (DRAT/LPR) and many similar formats, read more here
Candle: a verified implementation of HOL Light running on CakeML

Verified compilers built on top of the CakeML compiler or using parts of it. The CakeML compiler has been used to build the following verified compilers for other programming languages:

Kalas: A verified, end-to-end compiler for a choreographic language
Pancake: a verified compiler for a systems programming language
PureCake: a verified compiler for a lazy Haskell-style functional language

This website contains a list of news about CakeML (below), material that help newcomers get started, and further reading about various CakeML-related projects.

News

Mar 2024: This year, CakeML related papers have been accepted by AAAI, ESOP, CAV and IJCAR:
- AAAI'24: End-to-end verification for subgraph solving
- ESOP'24: Verified Inlining and Specialisation for PureCake
- CAV'24: Formally Certified Approximate Model Counting
- IJCAR'24: Certified MaxSAT Preprocessing
Jan 2024: At POPL 2024, our POPL 2014 paper CakeML: A verified implementation of ML received the Most Influential POPL Paper Award. We received a glass trophy each too.
Jun 2023: Two CakeML papers are part of PLDI 2023 and one in ITP 2023:
- PLDI'23: PureCake: A verified compiler for a lazy functional language
- PLDI'23: Cakes that bake cakes: Dynamic computation in CakeML
- ITP'23: Fast, verified computation for Candle

People

The CakeML project strives to be open, welcoming and fun; we are always on the look out for new contributors and new collaborators. The best way to get started is to read this page and say hi to us on Zulip or email one of the active developers.

Currently most active developers:

Contributors:

Oskar Abrahamsson,
Rob Arthan,
Heiko Becker,
Matthew Brecknell,
Eric Carlsson,
Fanny Canivet,
Connor Cashman,
Mauricio Chimento,
Quentin Corradi,
Nicholas Coughlin,
Gregorio Curello,
Eva Darulova,
Hugo Férée,
Anthony Fox,
Arve Gengelbach,
Sofia Giljegård,
Alejandro Gómez-Londoño,
Mike Gordon,
Armaël Guéneau,
Rikard Hjort,
Son Ho,
Jakob Holmgren,
Lars Hupel,
Felix Kam,
Stephen Kell,
Ramana Kumar,
Quentin Ladeveze,
Théo Laurent,
John Lind,
Andreas Lööw,
Alexander Mihajlovic,
Nebojsa Mihajlovic,
Dominic Mulligan,
Prashanth Mundkur,
Oskar Nyberg,
Scott Owens,
Christian Persson,
Christopher Pulte,
Matthieu Rodet,
Li Rönning,
Henrik Rostedt,
Adam Sandberg Eriksson,
Thomas Sewell,
Konrad Slind,
Michael Sproul,
Partha Susarla,
Hira Syeda,
Timotej Tomandl,
Trần Tiến Dũng (logo design),
Samuel Vivien,
Théophile Wallez,
Johan Wennerbeck,
Freek Wiedijk,
James Wood.

Acknowledgements

Zulip has sponsored this community by hosting its chat.
This work is partly supported by REMS: Rigorous Engineering of Mainstream Systems, EPSRC Programme Grant EP/K008528/1.
This work is partly supported by EPSRC grant EP/N028759/1 on Trustworthy Refactoring.
This work is partly supported by the Swedish Foundation for Strategic Research.
This work is partly supported by the Swedish Research Council.

Code and publications

The code for this project is hosted on GitHub. Our publications describe ideas behind the code.

Compiler binary and how-to

Download a binary for a recent CakeML compiler here as a tar.gz file. This how-to file explains the CakeML language and how to use the binary.

Candle prover

Candle is a verified version of HOL Light.

PureCake compiler

PureCake is a verified compiler for a Haskell-like language.

Get involved

Join the Zulip!

Latest compiler

The latest verified CakeML compiler passes through 8 intermediate languages and targets machine code for 6 architectures.