Fork me on GitHub

Design decisions I do not regret

Posted on January 11, 2021 by Troels Henriksen

I have written several posts on things that we got wrong in Futhark, why the compiler goes wrong, as well as posts about complicated features where it is still unclear whether the feature works well enough. However, there are also some parts of Futhark’s design that I feel pretty good about! This post will discuss some of them. I will focus on very concrete details, rather than grand observations such as “bulk-parallel functional programming is a good paradigm for numerical programming”. Such doctrinal statements are more of an entire thesis, and require much richer and more qualified answers than just “this turned out pretty well”. Everything to follow is what I don’t regret, so it is necessarily subjective, but maybe other language designers will find it useful or amusing.

Short names

Naming things is one of the two most difficult problems in computer science, and in contrast to the other two (cache invalidation and off-by-one errors), there is no tooling or theory to tell you when you pick a bad name. Matters are even graver when it comes to names that are part of public interfaces. You may well receive acclaim for fixing a subtle cache invalidation bug in old and widely used code, but do not expect similar praise if you also change the naming convention of the public API from underscores to CamelCasing along the way.

Futhark exposes many names, both in its command line interface and its built-in libraries. Many of the names have been changed over the years. When we initially fleshed out the primitive type system, I agonised over how to name the new types. Should 32-bit signed integers be Int32 as in Haskell, or i32 as in Rust? I worried the latter was too short. Yet in the end, that was the scheme we went with: i8/i16/i32/i64 for the signed types, u8/u16/u32/u64 for the unsigned, and f32/f64 for floats.

Incidentally, here is a piece of advice: if you are ever agonising over some design detail that is not core to what makes your language special, and all options seem equally reasonable, just go with whatever Rust does. Odds are that whatever Rust decided was preceded by significant debate, and that whatever they picked has few gotchas. Lexical syntax is one area where this is particularly good (and easy) advice to follow.

But back to naming things. The terse type names turned out to work fine. In fact, I realised that I have never regretted making a name too short. The names I still regret tend to be those that are too imprecise or too long - the negate function in the f32 and f64 modules should have been neg, for example. I am not quite sure why this is. When I code in Haskell, I have a preference for relatively verbose names for top-level definitions. In Futhark, short names just seem to fit better with the language aesthetics. Maybe that is just the nature of numerical code? I don’t know. But I definitely rely on this observation to guide future naming of things.

File orientation

While Futhark is not designed for large programs, we still want to support multi-file programs. Code in a file A must be allowed to reference definitions in some other file B. Languages differ significantly in how they go about this. Futhark went for a particularly simple model: the declaration import "foo" allows access to the definitions in the file foo.fut, looked up relative to the importing file. That is all there is to it. If you know how a hierarchical file system works, then you know how Futhark looks up files. This is not textual inclusion as in C - each file still has to be syntactically valid on its own. We are not barbarians.

The philosophical background is spelled out here, but the basic rationale is that nobody enjoys learning about language import mechanics, and in Futhark we can get away with doing something stupidly simple. Other languages, like Haskell and Java, put more effort into decoupling the notion of a “module” or “compilation unit” from that of a physical file. Even languages that reason in terms of files usually also provide support for a “search path”, where you provide the compiler with additional directories that are searched when including files.

The downside of this approach is that for larger programs with multiple directories of source files, a file has no unique way to be referenced (except for its absolute pathname, which is not practical), so you may need to use different relative paths in different files (e.g. "../lib/foo" and "../../lib/foo"). Another downside is that you cannot have a central repository of “system files” that are always checked (like /usr/include in Unix).

A major advantage of the design is that each file in a program can be processed as the “root”, without having to look for some kind of project definition file that contains necessary compiler directives such as include paths. This has allowed very lightweight zero-config tooling, such as a “go to definition” command in futhark-mode and showing the types of variables - something that most Emacs modes definitely do not have without additional configuration.

Clearly this solution wouldn’t work for all languages, but it has worked really well for Futhark. I was initially worried because this design is much more similar to bad languages (like shell script) than to good languages (like Haskell), but it hasn’t caused problems yet, and I don’t recall any Futhark programmers ever being confused about how import works. This is the right design for Futhark, and I don’t expect to regret it any time soon.

No block comments

These remain a bad idea. Emacs has M-x comment-region and inferior editors likely have something similar. No regrets. In fact, my experience writing more semi-syntactic tooling has only strengthened my conviction. Line comments are easy to parse in an ad-hoc way, and it is particularly nice that there is only one way of writing comments.

Explicit binding

Suppose we have a record type:

type t = {foo: i32, bar: bool}

In a function definition we can bind variables a, b to the field contents as follows:

let f ({foo=a, bar=b}: t) = ...

And if we wish to bind variables with the same names as the fields, there is a shorthand syntax inspired by OCaml and Haskell:

let f ({foo, bar}: t) = ...

These languages also support an even more concise form:

let f ({..}: t) = ...

This implicitly binds variables with the same names as the record fields. Futhark does not support this. The reason is a philosophy of never bringing names into scope unless those names are actually syntactically present at the binding site in some form. There are many things I enjoy about programming, but guessing where some variable comes from is not among them. I find C++’s policy of implicitly bringing class fields into scope inside method bodies to be particularly confusing.

My distaste for complex binding structure goes further: Futhark requires that all definitions occur before their first use. Originally this was just for implementation simplicity, but I have come to enjoy the restriction. It’s nice not to worry so much about the best definition order, because the compiler already ruled out most options. OCaml and SML have similar restrictions, so this is not just Futhark aping some 70s Pascal dialect.

The one exception to this explicitness principle is the open statement, which brings names from another module into scope. While a plain open is rare, it is used implicitly by import. There are also local open expressions, which allow us to write e.g. M.(x*y+z) with the names in module M in scope within the parenthesised expression. This is really convenient when writing advanced modularised numerical code, as the alternative is the much more clumsy x M.* y M.+ z. I am not sure how to reconcile this with our general principle, but maybe you just can’t be a fundamentalist all the time.

No dependencies

Futhark is written in Haskell, and all of its code dependencies are either pure Haskell libraries or embed small amounts of C code. This means that it is quite easy to build the Futhark compiler - all you need is a standard Haskell build tool like stack or cabal, which is supported and documented on most operating systems. I do not recall anyone having trouble installing the compiler itself. Difficulties tend to be more in the direction of setting up GPU drivers on Linux (yet another of the great unsolved problems in computer science).

This policy is not all upside. For many years, we lugged around a home-brewed implementation of algebraic simplification because we were averse to native code dependencies. We also generate C code (compiled with the system C compiler) rather than interacting directly with LLVM the way Accelerate does it. This means that there are some things we cannot efficiently express in our generated code. On the other hand, Accelerate’s dependency on LLVM can make it tricky to install, unless your system happens to have the right version of LLVM installed (and HPC systems and servers running mature versions of Red Hat Enterprise Linux tend not to). ISPC provides compiler binaries that embed a statically linked version of LLVM - this may be a direction we could go in, but I suspect statically linking LLVM with a Haskell program is more tricky than linking it with C++. And frankly, I think I already spend too much time hacking on build systems and CI setups.

We have gone quite far with this focus on installation simplicity. For example, the Futhark package manager does not even use native code to download packages. Instead, it shells out to curl. This has allowed us to statically link the Futhark binary that we provide for releases, a well as not having to worry about certificate store configuration. If you can download with curl in the command line, then futhark pkg will work. And if your system requires you to configure a special proxy or whatnot, then whichever configuration files or environment variables that curl respects will also work for futhark pkg.

I was initially hesitant about this approach, as I was always taught that shelling out is ugly, but it has worked smoothly for us so far.