We are always on the lookout for collaborators to work with us on CakeML.
These are potential projects that range in scope and difficulty suitable for internship or undergraduate dissertation projects, to masters projects, to the start of a PhD. If you are interested in working on one of these projects, or a CakeML project of your own design, contact one of the CakeML developers (listed on the main CakeML page) or write to the CakeML User's mailing list.
The listing on this page is roughly categorised into smaller and larger projects, though many can be scaled to fit.
Implement and verify a compiler optimisation
The design of the CakeML compiler allows for various optimisation passes to be implemented as local source-to-source transformations over one of the compiler's intermediate languages. Implementing and verifying one such optimisation is a good way to improve the CakeML compiler while gaining experience with theorem proving and compiler writing. Examples include Issues #13, #104, #114, #129, #130, and #131. The projects on pattern-match exhaustiveness and dead-code elimination are also examples.
This project is suitable for an internship; it can be scaled up for more involved optimisations.
Implement pointer equality
Pointer equality is a primitive operation in a programming language allowing a test of whether two objects are stored at the same location in memory. It is a useful optimisation to avoid unnecessarily traversing two apparently different objects to test whether they are structurally equal. Such a primitive is found in many languages: ‘==’ in OCaml, ‘eq?’ in Scheme, ‘PolyML.pointerEq’ in PolyML, etc.
A plan for this project is described at Issue #90. Partial progress has been made
This project is suitable for an internship.
Verify a more sophisticated pattern-match exhaustiveness checker
In CakeML, as in Standard ML, the semantics of a pattern-matching expression when none of the cases match is to raise a Match exception. In one of the CakeML compiler's intermediate languages, these Match exceptions must be raised explicitly after a catch-all pattern. However, pattern matches that are already exhaustive do not require the catch-all case, and leaving it out produces better code.
The compiler uses an exhaustiveness checker to check whether it can leave out the catch-all case. This checker is currently incomplete (it does not detect all exhaustive patterns), and based on a very simple heuristic. The project here is to improve the exhaustiveness checker, and ideally to make it complete.
This project is suitable for an internship. There is also a larger scale project to improve pattern-match compilation generally.
Improve the implementation of strings
The CakeML compiler currently implements strings in a space-inefficient way: every character of the string takes a full machine word on the heap, even though only 8 bits are required. There has been some discussion on the design for a better implementation at Issue #100.
Additionally, CakeML currently does not support a rich string library. Implementing the necessary primitive operations and/or a richer library of useful functions over strings could be an additional or alternative version of this project. It would probably benefit from coordination with work on the basis library
This project is suitable for an internship, and with more work on library design could be scaled up to an undergraduate or Masters thesis.
Implement and verify dead-code elimination
Statically detecting code that will never be run and removing it can decrease the size of the generated code and increase the performance of the compiler. This also can have a cascading effect, enabling and improving other optimisations.
This project can be scaled to suit many levels, depending on the how sophisticated an analysis is done.
Design and implement support for type annotations
Standard ML (and other functional programming languages) supports programmer-written type annotations that are taken as additional constraints for the type checker to satisfy. This can lead to more readable code, where the programmer's intent is made explicit, and can rule out certain errors due to unwanted polymorphism. CakeML's type inferencer does not currently support annotations.
This project would involve developing syntax for annotations, updating the parser and its proofs, and updating the type inferencer and its soundness and completeness proofs. Some discussion and context can be found at Issue #84.
This project is suitable for an internship or an undergraduate thesis. It is partly related to the larger project on improving the module system.
Add expression-level let-polymorphism
CakeML currently supports polymorphism for top-level declarations only. In other words, if you want to use a function at two different types, it must be a top-level function. Certain programs that would be valid in Standard ML do not type check in CakeML as a result. To support polymorphism for local declarations within expressions, one would need to tweak the type system accordingly, and then update the type soundness proof, and the type inferencer proofs.
Partial progress has been made on this (primarily by Yong Kiam Tan) on this branch.
This project is suitable for an internship.
Implement a Malfunction to CakeML compiler
Malfunction is a “high-performance, low-level untyped program representation, designed as a target for compilers of functional programming languages”. Typically, a higher-level language will target Malfunction in order to re-use the high-performance, well-tested OCaml compiler backend. This project is to make the verified CakeML backend an alternative target for Malfunction (and therefore for other languages that target it), by writing a translation from Malfunction to CakeML (or one of the CakeML compiler's intermediate languages). (Translation in the other direction might also be of interest.)
This project is suitable for an internship.
Add HipSpec support to HOL and CakeML
Add an interface to the HOL4 theorem prover that allows HipSpec theorems to be imported into HOL4 (and then used about CakeML programs). HipSpec proofs would need to be reconstructed in HOL4. It would be nice to be able to invoke HipSpec from within HOL4, i.e. the HOL4 user shouldn't need to know about HipSpec related details.
This project is suitable for an experienced intern, or for an undergraduate or Masters thesis.
Implement a front-end for CakeML that accepts OCaml or Haskell syntax
CakeML has both abstract and concrete syntax. The abstract syntax (that is, the datatype of syntax trees) is the primary vehicle for specifying the type system and the operational semantics and is what is produced by the HOL-to-CakeML translator. The concrete syntax is the primary vehicle for writing CakeML programs directly, and is what is accepted by the parser (which then produces abstract syntax) in the front-end of the CakeML compiler.
Because we use abstract syntax, the particular details of the concrete syntax are isolated in the parser. This means the rest of CakeML (semantics, translator, compiler) do not need changing to support changed or alternative concrete syntax. This is interesting because many programs are already written in either OCaml or Haskell syntax, and there are also improvements to CakeML (or Standard ML) syntax worth exploring.
An involved version of this project would be to write a new verified parser for an alternative syntax. A simpler version would be to re-target the parser in the OCaml compiler to produce CakeML abstract syntax, and treat this as an executable specification. Similarly for any Haskell compiler that is amenable to such treatment.
This project is suitable for an internship or a large undergraduate assignment.
Implement an unverified CakeML compiler
This implementation, in a functional language (OCaml, SML, Haskell, Scheme, etc.), would allow us to quickly and easily evaluate different optimisations and compilation techniques. This could inform future decisions about which optimisations to verify and which are not worth the effort.
This project is suitable for an intern with extensive programming experience.
Implement user-defined infixes
The SML language allows users to define their own infix symbols at varying precedence levels. It would be nice to provide this facility for CakeML.
Design and implement reflection for Candle
Candle is a verified implementation of the HOL Light theorem prover built on top of CakeML. The inference rules are proved sound against the semantics of higher-order logic, and the implementation of the inference kernel is verified to implement the inference rules.
Traditionally, the argument for soundness for LCF-based systems like HOL Light is that they have a small inference kernel which can be manually inspected, and the rest of the system's soundness depends only on the correctness of the inference kernel. Indeed, the small size of HOL Light's inference kernel made it amenable to verification. However, the additional evidence for Candle's soundness (namely, that it is verified), opens the possibility for adding more verified code to the kernel without reducing the soundness argument.
A particularly interesting addition to the Candle kernel would be support for computational reflection using the underlying verified compiler. This would be an inference rule in the verified kernel that takes an expression to evaluate and produces an equality between that expression and its result by compiling and running the expression. Such rules are implemented in various theorem provers (e.g., Isabelle and Coq), and are very convenient, but typically require departure from the strongest soundness guarantees of the LCF-style inference kernel. In theory, Candle should support such a rule without a reduction in its soundness argument.
Another very interesting idea in this vein is to add a procedure for extending the inference kernel dynamically (this is also sometimes called a reflection rule). Such a procedure would take the verification of a new inference rule as input, and either return a new verified kernel containing the new rule (plus all the old ones), or as a side-effect update the running verified kernel in place. Such a facility would allow one to bootstrap a theorem prover, in the style of Milawa. This could also be used for implementing decision procedures or other complicated tactics that can be more efficient than usual since they are not required to ultimately push their proofs through the small LCF-style kernel.
This project can be scaled between Honours/Masters thesis level and PhD thesis level.
Improve and extend the CakeML module system
The module system of a programming language provides namespace management and large-scale abstraction features to the programmer. Currently, CakeML has a very simplified version of the Standard ML module system. At least the following features from Standard ML are missing in CakeML.
- Nested modules. CakeML modules cannot currently contain other module.
- Named signatures, so the same signature can be used more than once without writing it out again.
- Including named signatures within other signatures.
- Opening structures, or some other convenience feature for bringing multiple exported names into scope unqualified.
- Sharing types between signatures.
- Functors, that is, modules parameterised on other modules.
Additionlly, the HOL-to-CakeML translator's support of modules is limited. It only supports translation into a single module, and does not support re-use of code previously translated into another module. Extending the translator's module support could be an alternative version of or an addition to this project.
This project can be scaled between internship size and undergraduate/Honours thesis-project size. Significant research into module systems and verified implementations thereof could also form the basis for a PhD.
Design and implement type classes
A type class is a set of methods over a type that are implemented differently according to the type's main type operator. For example, there is a type class for equality providing the equality-test method whose implementation depends on the type of things being compared. Type classes are one way to support a form of overloading (or ad hoc polymorphism), in which the same method can be used over multiple different types with a different implementation at each type. Other examples include ordering (with comparison methods), and printing (with a conversion-to-string method).
Standard ML has exactly one type class: equality. CakeML currently has none (the CakeML type system allows equality tests that are ruled out in Standard ML). However, the Standard ML module system is expressive enough to encode type classes, as described in this paper by Dreyer, Harper, Chakravarty, and Keller. It would be interesting to implement the idea of type classes as a use of modules in CakeML. Another approach, as followed by Haskell, is to include type classes as a fundamental feature of the type system.
This project can be scaled to suit any level between an undergraduate thesis and (part of) a PhD. It may benefit with coordination on the project to improve the module system.
Verify a more sophisticated pattern-match compiler
Pattern matching is an important feature of typed functional programming languages like CakeML, enabling programmers to write clear code that follows the structure of an explicit characterisation of the types of data the program is expecting. Pattern matching is a high-level feature that needs to be compiled down to the simple tests (if expressions) and jumps supported by lower-level languages.
The CakeML compiler currently does a few simple optimisations but essentially compiles pattern-matching to nested ifs following the structure of the original case expression. A more sophisticated approach might use automata or decision trees (see, for example, Maranget) or more aggressive optimisations. To our knowledge, formal verification of a sophisticated pattern-match compiler has not been done before.
This project is suitable for an undergraduate or Honours thesis, and could be scaled between internship and Masters level.
Add principled data refinement to the HOL-to-CakeML translator
When verifying algorithms, it is convenient to use abstract, mathematical types like functions and sets. For executable code, however, it is better to use efficient implementations of these concepts, such as red-black trees or hash tables. The move from abstract to concrete datatypes is called data refinement. The HOL-to-CakeML translator produces executable CakeML code from algorithms in HOL, but currently has an ad-hoc approach to data-refinement requiring manual work from the user in order to use efficient data structures in the generated code.
This project is to develop a more principled and automated framework for data refinement in the translator. There is prior work on data refinement in Isabelle/HOL by Haftmann, Krauss, and others, that would serve as a source of good ideas, e.g., this paper.
This can be scaled between Honours/Masters thesis level and PhD thesis level.
Design and implement automated verification of CakeML programs
The current approach to verifying a CakeML program involves verifying the algorithm in HOL using interactive proof, and then using CakeML infrastructure to produce the verified implementation. The interactive theorem proving part is typically difficult and labour-intensive, and may require expertise in theorem proving that is independent of a programmer's understanding of why the algorithm is correct.
For certain programs and specifications, however, it should be possible to automate the verification of the algorithm using at most hints from the programmer.
This project is to define a constrained subset of both CakeML and HOL in such a way that programs and properties staying within these subsets can be verified fully automatically. By remaining embedded within the larger languages, we would hope to get automation where possible without preventing manual verification of the harder bits.
This project is suitable for a Masters or PhD thesis.
Improve the speed and/or maintainability of CakeML proofs
CakeML is a substantial formal development (over 100,000 lines of proof script) that is starting to reach the size where scalability issues become relevant. The discipline of proof engineering, which is to proof development as software engineering is to large-scale software development, may be applicable.
Although there is endless engineering work to be done here in tweaking and refactoring, there are also many opportunties for creative ideas about how to structure and manage large formal developments, or for particular approaches to generating certain definitions or proofs automatically.
This project can be scaled between internship level (e.g., doing a substantial cleanup or refactor of part of the code base) and PhD level (making a research contribution to proof engineering techniques, with CakeML as an example).
Design and develop the CakeML basis library
Much of the usefulness of a programming language comes from its library of pre-defined useful functions. Compared to more widely used languages, Standard ML (and other functional programming languages) have relatively sparse standard libraries. However, CakeML currently has almost none. This project is to build up a standard library for CakeML.
Since CakeML has a formal semantics, and is designed for producing verified programs, there is an interesting opportunity here to ensure that the CakeML standard library consists of verified code. One possibility is to produce most of the verified library using the HOL-to-CakeML translator, as in Issue #99.
This project is suitable for a long internship or an undegraduate thesis, and could be scaled up to a Masters thesis. Significant work on designing a large library of verified code could be the basis for a PhD project.
Design and implement a systems programming dialect of CakeML
CakeML is a high-level functional programming language designed to run with automatic memory management (that is, a garbage collector), and to abstract low-level data representation details away from the programmer. However, there are some applications, particularly in systems programming, where manual memory management and access to low-level details are desirable. Fortunately, the CakeML compiler's intermediate languages include languages where low-level details are explicit, and could form the basis for an alternative compiler backend supporting low-level programming.
Some thoughts on this project have been outlined at Issue #111.
This project can be scaled between Masters and PhD level.
Improve the design of CakeML intermediate language specifications
Implement technology for specifying the CakeML compiler's intermediate languages without excessive boilerplate. The specifications currently contain redundant text that is almost copy-pasted. It would be nice to instead specify one intermediate language as a “readable diff” of the previous language, in a similar vein to the Nanopass Framework.
This project is suitable for a Masters thesis, and could scale up to PhD level.
Add floating-point arithmetic to CakeML
Floating-point numbers are used for efficient, approximated computations over real numbers. Their semantics are specified by an IEEE standard, and this has been partially formalised in HOL. This project is to add floating-point primitives to CakeML, and (optionally) to develop infrastructure for reasoning about programs using them.
Some initial discussion can be found at Issue #79.
This project is suitable for at least a Masters thesis; when combined with techniques for verifying programs using floats it scales to PhD level.
Another related option here is to target WebAssembly instead.
This project can be scaled between an internship (e.g., just implementing the target) and PhD level.
Investigate the implementation of verified cryptographic primitives and protocols in CakeML
Cryptographic algorithms are interesting candidates for formal verification because of their importance and their relatively small size. This project is to look into how CakeML might be used for such verification. Whereas CakeML provides a direct route to functional correctness, there are other non-functional properties of cryptographic algorithms of interest including performance and security properties like information flow and side-channel attack resistance. Therefore part of the investigation will be about building evidence for such properties on top of functional correctness.
Possible starting points include this paper on the SHA-256 cryptographic hash function by Andrew Appel, and this one on private key encryption algorithms by Duan, Hurd, Li, Owens, Slind, and Zhang.
This project can be scaled between undergraduate and PhD level.
Connect CakeML to the MicroVM
The MicroVM project aims to implement an efficient language-neutral back-end for client languages.
Past and Ongoing Projects
- Characteristic Formulae for CakeML (ongoing). (Armaël Guéneau)
- A generational garbage collector for CakeML (ongoing). (Adam Sandberg Eriksson)
- Implement a front-end (unverified) for CakeML that accepts OCaml syntax. (James Wood)
- Prove that the type inference algorithm is complete with respect to the declarative type system. (Yong Kiam Tan)
- Implement and verify a register allocator for CakeML. (Yong Kiam Tan)
- The Compiler Explorer, a web service that pretty-prints the verified compiler's transformations. (Yong Kiam Tan)
- Implement CakeML Bytecode on FPGA, and use this to complete an end-to-end verified chain down to hardware (stalled). (David Greaves, Jonathan Kimmitt)
- Define a lazy version of the CakeML semantics. (Guoqiang Liang)
- Extend the HOL-to-CakeML proof-producing translator with better support for monadic stateful code (targeting CakeML's references and exceptions). (Christopher Pulte)
- The MAC in the Box Project (stalled). (Robert Künnemann, Mike Gordon)