.@ Tony Finch – blog


Tennent's correspondence principle is a powerful programming language design guideline. It says that the meaning of an expression or statement should not change when an extra level of block structure is added around it. Following this principle strictly leads to profound consequences, especially for control structures and closures.

There are some implications for variable scoping, too. If the extra level of block structure is a binding construct (such as let or for) then the new variable may shadow an outer variable and change the meaning of inner uses of that name. This is usually treated as a style warning rather than an error as Tennent would suggest.

Instead of forbidding shadowing, another way to follow Tennent might be to specify that local variables have function scope not block scope, as Javascript does. Then all uses of a variable name in a function refer to the same object. However this causes trouble if the language also has nested functions, because this brings back nested scopes and the shadowing problem. The combination of closures and function-scoped variables that look like block-scoped variables leads to a well known Javascript gotcha, so it's probably best to stick with block structure.

Tennent's principle gets more interesting when you look at control structures. Firstly, it says that C-style break and continue are a bad idea. If the language instead has labelled loops, so you write break label then the break still refers to the same loop even if you add an intermediate loop.

Strict adherents of structured programming say that you should follow Tennent by abolishing constructs like break and goto. However this is going too far: to code within this restriction you often have to add extra state variables to produce the control flow that you want, which is less readable and less efficient.

But this pragmatism blows up in your face if your language has nested functions. Ideally you would like users of the language to be able to define their own control structures by writing appropriate higher-order functions. Then the bodies of these control structures are nested functions. And within these control structures, break, goto, and return should work as they do with built-in control structures. The problem with this is it allows nested functions to capture their continuations. In fact, you can define call-with-current-continuation as follows (using a Lua-ish syntax with a named return statement analogous to labelled break):

  function callcc(f)
    local function k(r)
      callcc returns r
    end
    callcc returns f(k)
  end

First-class continuations cause enormous difficulties for language implementations and code that uses them is often extremely hard to understand. How can we escape from this trap?

There is the option of hair-shirt structured programming which bans early returns. This prevents inner functions from making non-local jumps to outer functions. There is the option of not supporting nested functions and higher-order programming. But neither of these are very interesting.

Continuations cause problems when they are not used in a stack-like manner. It is possible to keep nested functions to a stack discipline if you allow them to be used in only two ways: they can be called, and they can be passed as a function argument. They cannot be copied into variables in outer scopes or data structures on the heap, and they cannot be returned up the stack. You can lift these restrictions if the closure does not capture its continuation, which is easy to check statically by looking at its jumpy statements.

Smalltalk-style blocks are enjoying a renaissance at the moment, having been re-popularized by Ruby. Instead of making static restrictions, Smalltalk relies on a dynamic check that prevents callcc from working. Mozilla's Rust programming language implements has second-class "stack closures" (though it isn't clear to me if you can jump out of them) as well as first-class closures that cannot capture their continuation.

It seems to me that this approach is a good way to support expressive higher-order programming and embedded domain-specific languages with a conventional stack-based language implementation.