Programming languages in configuration files

Justin Mason posted an interesting article against the use of programming languages in configuration files. I think he is mostly right but his arguments could be improved.

His first argument is provability, that is, ensuring that a configuration does what it is supposed to do. The problem here is that it is very easy to make a language which is unexpectedly Turing-equivalent. In particular, as soon as you have a combination of regex-based rewriting and iteration, you have a Turing machine. There have been practical demonstrations of this fact for Sendmail and Exim but it must also be true for Postfix (since it has regex email redirection and redirection iterates) and Apache (via mod_rewrite of course).

But my counter-argument is mostly irrelevant, since in practice these kinds of rewrite engines have very strictly limited iteration counts - 50 or 100 is typical. And in practice you know when you are getting on to thin ice when regular expressions start to proliferate.

Justin's second argument is security. His example is SpamAssassin, where it is easy to accidentally get exponentially slow pattern matching time out of a poorly-written regex. Exim provides some much better examples, since its string expansion operators can do arbitrarily complicated things (such as running a shell command) and, even worse, it is not obvious how privileged each string expansion is from its context in the configuration.

His final argument is usability. I think this really is the crux. Provability - being able to understand a configuration by inspection - is really a special case of usability. And security is a special case of provability.

The usability problem is a consequence of a program's growth. As it gains features, there is an enormous pressure for the program's configuration language to get more complicated, and to expose those features to sysadmins without requiring them to learn how to write code that plugs in to an API. But if the program's developers indulge these demands they end up growing what started out as a simple configuration language into a fully-fledged programming language - except that it has really bizarre syntax and poorly-defined semantics.

However, before your program gets bloaty, usability leads you into this trap, since it is the reason you don't want to use a programming language for configuration. Imagine trying to configure your favourite complicated over-featured daemon using your most lightweight favourite programming language. The result is almost certainly unsatisfactory - except perhaps if you are a Lisp or Smalltalk hacker, or (as Justin suggests) if you have drunk the Ruby-without-brackets kool-aid. So programmers start with something ad-hoc rather than using an embeddable programming language.

Except if they know Tcl. I think Tcl is massively under-rated and should be used a lot more than it is. Its syntax is so unobtrusive that it is the perfect substrate for a configuration language. And, because it comes with ready-made semantics and proper programming constructs, it can cope when your configuration requirements become more demanding. And the techniques for sandboxing and subsetting it are well known, so it can be secure.

There are lots of other embeddable programming languages. I'm a huge fan of Lua, and Guile (the GNU embeddable Scheme interpreter) has been around for ages, but both of them have a lot of syntax which will intrude into simple configuration tasks. The other dynamic languages are much worse, with their bloat, horrible embedding APIs, and intrusive syntaxes.

I think Tcl suffers because people treat it as a programming language, and it is not a very good one. It is the most "stringly typed" language there is, so it is prone to type safety bugs that don't exist in other dynamic languages. Its variable and data structure semantics are downright odd.

But if you treat it as a configuration framework that can grow with your users' demands, it rules. Note that it provides a framework in three ways: as a library for your program to use for reading its configuration; as a set of design patterns to use for your program's configuration language; and as a way to plug in extra functionality (since Tcl can load modules dynamically).

I do not mean this to be a Tcl advocacy piece. One consequence of using something like Tcl or Lua as your configuration language is that the tail can end up wagging the dog. There's an enormous temptation to move functionality up into the dynamic language - this is partly why Tcl is branded as a programming language rather than a configuration library. This means you have to treat your configuration language more like an API.

Perhaps this is the way to escape from Tcl and to add a concluding argument to Justin's article. Instead of adding endless flexibility to your configuration language, grow a proper plugin API instead. Keep your configuration language simple for those who want to plough the well-worn furrow; assume that advanced users can program and encourage them to do that properly. Sendmail's milter API is a successful example (despite its flaws).