2009-05-15 – Use your bonce

Vesta includes a purely functional programming language for specifying build rules. It has an interesting execution model which avoids unnecessary rebuilds. Unlike make, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for #include etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.

I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called bonce that I don't have time to write.

bonce is an adverbial command, i.e. you use it like bonce gcc -c foo.c. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of the paper, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an LD_PRELOAD hack that intercepts system calls, e.g. open(O_RDONLY) is a dependency and open(O_WRONLY) is probably an output file, and exec() is modified to invoke bonce recursively. When the command completes, its dependencies and outputs are recorded in the cache.

bonce is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.

It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which bonce can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.

⇐ 2009-05-14 ⇐ Define SCM ⇐ ⇒ CRSIDs and email addresses ⇒ 2009-06-02 ⇒