.@ Tony Finch – blog


One of the things that made the Bourne Shell notorious (apart from the obvious Bournegol) was its memory management. Its allocator just walked up the address space as needed. If it hit the end of its mapped memory, it would SEGV, drop into its signal handler, ask the kernel for more memory, and return. This caused inordinate problems later when Unix was ported to systems with less amenable signal handling. (See OpenSolaris for a slightly cleaned-up Bourne shell.)

Larry Wall’s 1987 IOCCC winner goes something like:

extern int fd[];
signal(SIGPIPE, start);
pipe(fd);
close(fd[0]);
for(;;)
write(fd[1],ptr,n);
The write serves only to trigger SIGPIPE, and the signal handler is repeatedly changed in order to implement the flow of control (which the judges called “amazing”).

I thought it would be amusing to combine these ideas.

extern char prog[];
extern int pc;
for(;;)
((void()(void))dlsym(NULL,prog[pc]))();
This looks like a pretty boring interpreter if you assume that the prog consists entirely of built-in function names. But what if it doesn’t? Then dlsym() will return NULL and the program will SEGV. The signal handler can then push a return address on the interpreter stack, change pc to point to the user-defined function definition, and longjmp() back to the loop.

(Aside: you can have symbolic builtins like +-*/ if you choose function names that have the same ELF hash as the symbols you want, then run sed over the object code to change their names to symbols.)

The problem is that dlsym() isn’t very portable - certainly not enough for the IOCCC. So we need a better way of constructing a symbol table, and preferably without having to mention the function name loads of times (e.g. as in Larry’s program). The problem is that you can’t interleave the array initializers with the function definitions (at least not without non-portable hacks like linker sets). But how about this:

#define BUILTIN(name, code) 
void name(void) code { #name, name },

BUILTIN(add,{
    int b = pop();
    int a = pop();
    push(a + b);
}

BUILTIN(sub,{
    int b = pop();
    int a = pop();
    push(a - b);
}

BUILTIN(exch,{
    int b = pop();
    int a = pop();
    push(b);
    push(a);
}

struct {
    const char *name;
    void (*code)(void);
} builtin[] = {
      )))
    { 0,0 }
};