CakeML: A Verified Implementation of ML

About

CakeML is a functional programming language and an ecosystem of proofs and tools built around the language. The ecosystem includes a proven-correct compiler that can bootstrap itself.

The CakeML project consists of the following components, all of which are free software.

Language definition. The CakeML language is based on a substantial subset of Standard ML. Its formal semantics is specified in higher-order logic (HOL) in a functional big-step style. The core of the language (its syntax and semantics) is quite stable, but the standard basis library is still undergoing development. Contributions are welcome!

Compiler backend. The CakeML compiler has many parts. The most significant part is the verified compiler backend, which transforms an untyped AST to concrete machine code for one of 6 target architectures. The compiler backend has been proved to only produce machine code that is compatible with the behaviours of the source programs. The backend passes through several intermediate languages (as this diagram illustrates) and performs some optimisations.

Compiler frontend 1. There are two frontends to the compiler. The first one is a proof-producing synthesis tool (called the translator). It generates CakeML AST from ML-like functions in HOL and proves that the generated AST has the same behaviour as the HOL function. The original version of this tool produced only pure CakeML code, but more recent versions can produce code that performs I/O and uses state, including local state.

Compiler frontend 2. The second compiler frontend consists of a traditional parser followed by a type inferencer. Both of these have been proved sound and complete with respect to declarative specifications. For the parser, this means that our PEG parser implementation finds a correct parse tree if there exists one according to a traditional grammar for CakeML concrete syntax (SML). Soundness and completeness of the type inferencer means that, if the program can be typed, then the inferencer will find a type (which is the most general type).

Compiler bootstrapping. The CakeML compiler has been bootstrapped inside HOL. By bootstrapped we mean that the compiler has compiled itself. This was achieved by noticing that frontend 2 combined with the backend is a HOL function which we can feed into the tool-chain consisting of frontend 1 and the backend. The result is a verified binary that provably implements the compiler itself (with frontend 2). Recent built bootstrapped binary are here as tar.gz files. The bootstrapping is described here and here.

Post-hoc verification of CakeML programs. We have adapted Charguéraud's CFML verification framework to CakeML. Usually, we recommend that verified CakeML code is produced via synthesis using frontend 1. However, in some cases it is more convenient to do Hoare-style reasoning in the separation logic of CFML. CakeML's version of CFML supports reasoning about references, arrays, exceptions and I/O, and is used for verification of parts of the CakeML basis library.

Verified applications built using CakeML. The CakeML tools are geared towards production of verified applications using proof-producing synthesis (frontend 1) and compilation inside HOL (in-logic evaluation of the compiler backend). To date, the largest case study is the bootstrapped CakeML compiler. Other end-to-end verified applications that have been produced using the CakeML tools are:

This website provides further details of the CakeML project, links to papers, courseware, auxiliary tools and suggestions for masters and doctoral level projects.

News

  • We continue the series of online talks that was started last year:

    • 28 Jan: Arthur Charguéraud
      Characteristic Formulae Revisited
    • 03 Feb: Michael Norrish
      HOL Tips and Tricks
    • 11 Feb: Riccardo Zanetti
      Towards a formally verified non-strict language compiler
    • 16 Mar: Yong Kiam Tan
      Efficient SAT Proof Checking with CakeML
    • 31 Mar: Heiko Becker
      Tactic Language Development with Lassie (Demo)
    • 13 Apr: Andreas Lööw
      Differences between verifying a (software) compiler and a (hardware) synthesis tool
    • 27 Apr: Leonid Merkin, Ruslan Rezin, Nikolai Vasiliev
      InnoChain: A formally-verified distributed ledger system based on CakeML and HOL4
    • 03 Jun: Sofia Giljegård, Johan Wennerbeck
      Puzzle Solving with Proof — Writing a Verified SAT Encoding Chain in HOL4
    • 09 Jun: Minchao Wu
      TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

    Everyone is welcome to these talks! For more information about upcoming talks, join our Discord (with this link) and see the #announcements channel.

  • There are four HOL4 based papers at CPP'21:

    The links include videos of the talks.

  • Check out our OOPSLA'20 paper on a verified space cost semantics for CakeML.

  • This year (2020) we will have a series of online talks instead of our usual in-person developers meeting. The following are the held/scheduled talks:

    • 21 Aug: Thomas Sewell
      Adding an Eval Primitive to the CakeML Language
    • 27 Aug: Alejandro Gómez-Londoño
      Choreographies and Cost Semantics for Reliable Communicating Systemspdf
    • 02 Sep: Oskar Abrahamsson
      Verified Proof Checking for Higher-Order Logicpdf
    • 08 Sep: Michael Norrish
      HOL Tips and Tricks
    • 11 Sep: Andreas Lööw
      Lutsig: A Verified Verilog Compiler for Verified Circuit Development
    • 22 Sep: Arve Gengelbach
      Model-Theoretic Conservative Extension of HOL with Ad-hoc Overloadingread more here
    • 29 Sep: Konrad Slind
      Supporting Security-Enhancing Architectural Transformations with CakeML
    • 06 Oct: Thibault Gauthier
      An introduction to TacticToe and HolyHammer
    • 20 Oct: Johannes Åman Pohjola
      seL4 + CakeML = Very Verified System Initialisation
    • 30 Nov: Hrutvik Kanabar
      A Semantic Type System for CakeML

    Everyone is welcome to these talks! For more information about upcoming talks, join our Discord (with this link) and see the #announcements channel.

  • CakeML's program logic has been updated to be able to prove liveness properties for non-terminating programs, see our ITP'19 paper for details.

  • We have a CAV'19 paper on fast-math style optimisations for floating-point programs.

  • This year's CakeML Developers Meeting will take place on 13-14 May 2019 at Chalmers, Sweden.

  • Our paper on Verified Compilation on a Verified Processor will be presented at PLDI'19.

  • Our new journal paper, The Verified CakeML Compiler Backend, describes version 2 of the CakeML compiler, Journal of Functional Programming, Volume 29, 2019.

  • The CakeML team at Chalmers has an open postdoc position. Application deadline: 28 Feb 2019.

  • The following FMCAD'18 paper uses CakeML: A Verified Certificate Checker for Finite-Precision Error Bounds in Coq and HOL4

  • The following CakeML-related papers were presented at FLoC 2018:

  • There is a PhD position open at Kent on CakeML. Application deadline: 21 May 2018.

  • There is a two-year postdoc position open at Chalmers on topics related to CakeML. Application deadline: 18 May 2018.

  • Hupel and Nipkow have a paper at ESOP'18 on A Verified Compiler from Isabelle/HOL to CakeML

  • The following paper at PLDI'18 uses CakeML: VeriPhy: Verified Controller Executables from Verified Cyber-Physical System Models

  • There will be a talk about CakeML at the DeepSpec workshop at PLDI'18

  • Data61 is hiring proof engineers again in Sydney, Australia. Application deadline: 15 April 2018.

  • There was a CakeML developers meeting on 29 May 2018 at Chalmers.

  • Data61 is hiring proof engineers and a research scientist in Sydney, Australia. CakeML experience/interest would be very desirable for both. Application deadline: November 21 2017.

  • Our paper on Verifying Efficient Function Calls in CakeML will be presented at ICFP'17.

  • Our paper on A Verified Generational Garbage Collector for CakeML will be presented at ITP'17.

  • We have bootstrapped the CakeML compiler for RISC-V. The result of bootstrapping in the logic is available (as it is for x86-64 also).

  • Ramana Kumar received the John C. Reynolds Doctoral Dissertation Award 2017 for his PhD thesis on Self-compilation and Self-verification.

  • There will be CakeML tutorials at PLDI'17 (full-day tutorial) and ICFP'17 (half-day tutorial). These tutorials will let participants try out the tools of the CakeML ecosystem and use them to produce end-to-end verified binaries. The tutorials are a good way to get involved in CakeML.
    CakeML ecosystem

  • We have released some previews of CakeML version 2.

  • There are two open PhD positions related to CakeML at Chalmers University of Technology, Sweden. Application deadline: 25 April 2017

  • ESOP'17 has accepted our paper (preprint) on verified characteristic formulae (CF) for CakeML.

  • CPP'17 has accepted our paper (preprint) on the target-specific aspects of compiling and assembling CakeML to real machine code.

  • Our IFL'15 paper (preprint) about CakeML's type system and inference algorithm has been awarded the Peter Landin prize for best paper.

  • ICFP'16 has accepted our paper (preprint) on the new CakeML compiler, A New Verified Compiler Backend for CakeML. A diagram of the new compiler can be seen on the right.

  • Our ESOP'16 paper (preprint) describes and advocates the use of functional big-step semantics for both reasoning about programming languages and compiler verification.

  • CakeML together with our formalisation of HOL in HOL is being used in research supported by the Future of Life Institute on AI safety, in particular on improving our understanding of reflective reasoning. Our ITP'15 paper on reflection precedes and supports the work on this project.

  • A new representation for pattern matching in HOL, as described in this ITP'15 paper on pattern matching, has been incorporated into the HOL-to-CakeML translator.

  • We gave a guest lecture in the Advanced Topics in Software Verification course at UNSW, about bootstrapping the verified CakeML compiler. (See the slides for Week 10, 2016, or Week 12, 2015.)

  • The beta (see item below) now also supports character literals.

  • A beta version of the verified CakeML implementation (version 1) is available here. To build and run the code: unzip, then make, and ./cake.

    Note that this is a beta version with a few known issues: (1) stack overflow is checked by the OS, leads to a segfault; (2) compiling on Mac OS produces some compiler warnings. If you stumble upon other issues, let us know on users@cakeml.org.

    This is verified software. Why might it have issues? The short answer is that we might not have checked our assumptions thoroughly enough. Is our model of the x86-64 machine accurate enough? Are we interacting with the unverified C code correctly? The assumptions have been tested, but one can never be 100% sure that testing caught all possible issues.

  • A preliminary version of the CakeML Compiler Explorer is available. This web service lets you run the verified compiler and see the results of various stages of compilation.

  • Our ITP'14 paper describes work on formalising the semantics and soundness of higher-order logic, including support for its principles of definition, and producing a verified implementation of the kernel of a theorem prover in CakeML. The paper describes a method for supporting definitions via translation into Stateless HOL.

    We have since developed a direct semantics for all of HOL (supporting both definitions and non-definitional context extensions including new axioms). This version was presented in our ITP'14 talk. It is described in more detail in our subsequent JAR paper.

  • We gave an MPhil course at Cambridge on specifying, implementing, and verifying functional programming languages.

  • Our POPL'14 paper describes CakeML and a verified x86-64 implementation of a read-eval-print loop for CakeML, using the bootstrapped compiler below, a verified translation from bytecode to x86-64, and a mostly-synthesised verified runtime.

  • We have written, verified, and bootstrapped a compiler (including lexing, parsing, type inference, and code generation) from CakeML to CakeML Bytecode. The correctness theorem covers both terminating and diverging programs, and says that the generated code behaves according to the semantics in either case. The compiler is written in HOL; we use the translator, mentioned below, to generate a verified CakeML implementation, and then evaluate the compiler on this implementation (bootstrap) to generate verified bytecode.

  • One of our initial target case studies is to construct a verified CakeML version of the HOL Light theorem prover. For this case study, we extended the translation tool, mentioned below, to be able to translate into stateful CakeML code. Initial results are described in our ITP'13 short paper.

  • We have developed a tool which translates functions from higher-order logic into CakeML. This tool is proof producing: for each translation it proves a theorem which states that the generated CakeML code correctly implements the original HOL function, according to the operational semantics of CakeML. See our ICFP'12 paper, and subsequent JFP paper, for more details.

People

Currently most active developers:
Contributors:
  • Oskar Abrahamsson,
  • Johannes Åman Pohjola,
  • Rob Arthan,
  • Heiko Becker,
  • Matthew Brecknell,
  • Eric Carlsson,
  • Fanny Canivet,
  • Connor Cashman,
  • Mauricio Chimento,
  • Quentin Corradi,
  • Nicholas Coughlin,
  • Gregorio Curello,
  • Eva Darulova,
  • Hugo Férée,
  • Anthony Fox,
  • Arve Gengelbach,
  • Sofia Giljegård,
  • Alejandro Gómez-Londoño,
  • Mike Gordon,
  • Armaël Guéneau,
  • Rikard Hjort,
  • Son Ho,
  • Jakob Holmgren,
  • Lars Hupel,
  • Felix Kam,
  • Stephen Kell,
  • Ramana Kumar,
  • Quentin Ladeveze,
  • Théo Laurent,
  • John Lind,
  • Andreas Lööw,
  • Alexander Mihajlovic,
  • Nebojsa Mihajlovic,
  • Dominic Mulligan,
  • Prashanth Mundkur,
  • Oskar Nyberg,
  • Scott Owens,
  • Christian Persson,
  • Christopher Pulte,
  • Matthieu Rodet,
  • Li Rönning,
  • Henrik Rostedt,
  • Adam Sandberg Eriksson,
  • Konrad Slind,
  • Michael Sproul,
  • Partha Susarla,
  • Hira Syeda,
  • Timotej Tomandl,
  • Trần Tiến Dũng (logo design),
  • Samuel Vivien,
  • Théophile Wallez,
  • Johan Wennerbeck,
  • Freek Wiedijk,
  • James Wood.

Acknowledgements

  • This work is partly supported by REMS: Rigorous Engineering of Mainstream Systems, EPSRC Programme Grant EP/K008528/1.
  • This work is partly supported by EPSRC grant EP/N028759/1 on Trustworthy Refactoring.
  • This work is partly supported by the Swedish Foundation for Strategic Research.
  • This work is partly supported by the Swedish Research Council.

Code and publications

The code for this project is hosted on GitHub. Our publications describe ideas behind the code.

Compiler binary and how-to

Download a binary for a recent CakeML compiler here as a tar.gz file. This how-to file explains the CakeML language and how to use the binary.

Candle prover

Candle is a verified version of HOL Light.

PureCake compiler

PureCake is a verified compiler for a Haskell-like language.

Get involved

Join the Discord!

Latest compiler

The latest verified CakeML compiler passes through 8 intermediate languages and targets machine code for 6 architectures. The verified CakeML compiler, version 2