C is Turing complete

Yesterday there was some discussion on the Orange Site about whether or not C is Turing complete. The consensus in the StackOverflow question is,

no, because the C abstract machine is a (large) finite state machine,
or maybe yes, if you believe that unaddressable local variables can exist outside the finite address space, and you can have an unbounded number of them, i.e. no.

My answer is definitely yes, if you include the standard IO library.

And using IO is much closer to Turing’s original model of a finite state machine working on unbounded peripheral storage.

C is a finite state machine

The C abstract machine limits the size of the state it can work on by limiting the size of pointers and the size of the objects that can be pointed to. There are some unaddressable objects but they are usually understood to be a small finite number of machine registers.

So the number of states in the C abstract machine is,

    ptr_bits = CHAR_BIT * sizeof(char *);
    memory_bytes = 1 << ptr_bits;
    total_bytes = memory_bytes + register_bytes;
    total_bits = CHAR_BIT * total_bytes;
    number_of_states = 1 << total_bits;

Typically about 2^(2^(2^(2*3)+3)) states. Which is a lot, but still finite.

is IO really unbounded?

So, for C to be Turing complete, it must support unbounded IO.

Traditionally, C stdio supports two kinds of stream:

unbounded streams, such as terminals;
seekable streams, such as files.

What we need is an unbounded seekable stream. It has to be seekable because we need to be able to move in both directions, and the only way to move backwards is to seek.

But aren’t seeks bounded? Well, no, it turns out. The declaration of fseek() is,

    int fseek(FILE *stream, long offset, int whence);

What is notable is that in many real-world implementations, file sizes have been measured with 64 bit numbers, but the fseek long offset has been only 32 bits. This works because the offset is not absolute: it’s relative to either the start or end of the file or the current file position indicator.

Edited to add… What about ftell()? Doesn’t the size of its long return type impose a limit on the size of a file? Well, there’s no requirement that ftell() works when fseek() works. Consider 32 bit long and 64 bit file sizes again: fseek() can work on terabyte-sized files even though ftell() can only return an error.

Therefore the size of a seek offset does not limit the size of a stdio file.

We can read and write within some finite distance relative to the file position indicator, and (so long as we treat its value as unknowable and incomparable) after a series of reads and writes and seeks the file position indicator can become arbitrarily large.

conclusion

So, in theory (after all, this is a theoretical model of computation) we can have unbounded seekable IO streams in C, so we can implement a Turing machine in C.

That’s the point of this post, so you can stop reading. But I worked out the details, trying to stay as close as possible to the classic Turing machine formalism, so read on if you like code. (The trick is in the read and write part.)

the symbols

A Turing machine has an alphabet of symbols or marks on the tape.

All of the tape is blank, except for a known region around the head of the machine. The known region covers the initial input (which is finite) and any part of the tape that has been visited by the head.

In a state transition, the machine can write a new symbol, erase the symbol, or leave it unchanged. To make things more uniform, we always write a new symbol: we write the blank symbol to erase, and write the same symbol to leave it unchanged.

As an implementation detail, we add a special symbol to represent the infinite_blanks to the left and right of the known region. The state transitions for infinite_blanks are the same as normal blanks. The Turing machine cannot write infinite_blanks; it must write one normal blank at a time.

    enum mark {
        infinite_blanks,
        blank,
        // ... other marks ...
        mark_count
    };

the state transitions

A Turing machine has a set of states, including a particular initial start state, and a final halting stop state.

In mathematical descriptions, the machine’s state transition function is described using a collection of 5-tuples:

the current state
the symbol under the head
the symbol to replace it with
which direction to move or not
the next state

The write happens before the move.

We represent it as a 2D array indexed by the current state and mark, containing instructions about what to do. There are no transitions for the stop state.

    enum state {
        start,
        // ... other states ...
        stop,
        state_count = stop
    };

    static struct transition {
        enum { move_left, move_not, move_right } move;
        enum mark mark;
        enum state state;
    } machine[state_count][mark_count] = {
        // ...
        // define your Turing machine here
        // ...
    };

the tape

We represent the tape using two files, one containing all the tape to the left of the head, and one containing all the tape to the right of the head. The cell of the tape under the head is represented separately, in memory.

The files contain a sequence of cells represented in binary as enum mark objects accessed using fread() and fwrite(). Each file is divided into three parts:

The first cell contains infinite_blanks. (No other cells contain infinite_blanks.) It is a sentinel that allows us to avoid trying to seek before the start of the file.
The cells between the first cell and the file position indicator are (the left or right parts of) the known region of the tape. The file position indicator is manipulated using fseek().
There can be garbage cells after the file position indicator. C does not give us a way to truncate the file to erase unused cells, so we just ignore them.

The files are in opposite orders: the left file reads left to right, and the right file reads right to left, from the beginning of each file to its file position indicator.

the machine

To start, we need to be given the initial contents of the tape.

There are three command line arguments: the left file, the cell under the head, and the right file.

After opening the files we need to move the file position indicator to the end of each file.

    int main(int argc, char *argv[])
    {
        FILE *left = fopen(argv[1], "wb+");
        fseek(left, 0, SEEK_END);

        enum mark mark = atoi(argv[2]);

        FILE *right = fopen(argv[3], "wb+");
        fseek(right, 0, SEEK_END);

The operation of the machine itself is sraightforward.

The read_write() function pushes a new cell onto the second file, representing the overwritten contents of the cell that used to be under the head; and it pops a cell from the first file that becomes the cell under the head, and returns its contents.

        enum state state = start;
        while(state != stop) {
            struct transition transition = machine[state][mark];
            if(transition.move == move_not)
                mark = transition.mark;
            if(transition.move == move_left)
                mark = read_write(left, right, transition.mark);
            if(transition.move == move_right)
                mark = read_write(right, left, transition.mark);
            state = transition.state;
        }

When the machine stops, we need to remove the garbage from the left and right files, and print the contents of the cell under the head, so that the caller can see the final contents of the tape.

        cleanup(left, argv[1]);
        cleanup(right, argv[3]);
        printf("%d\n", mark);
        return(0);
    }

read and write

This is the tricky part, where we make the files work as unbounded one-ended tapes.

Pushing a cell onto a file is easy: it’s just fwrite(), which extends the file as necessary (or overwrites garbage). This is where unbounded storage is created.

Popping a cell is more faff, because we have to read backwards. We have to move the file position indicator before the cell, read the cell – which moves the file position indicator after the cell, so we have to move the file position indicator before it again.

When we reach the start of the file, we skip the second seek, so that the infinite_blanks always remain outside the known region of the tape, before the file position indicator, and do not erroneously become garbage.

    #define MARKSIZE (long)sizeof(enum mark)

    static enum mark
    read_write(FILE *read, FILE *write, enum mark mark)
    {
        fwrite(&mark, sizeof(mark), 1, write);
        fseek(read, -MARKSIZE, SEEK_CUR);
        fread(&mark, sizeof(mark), 1, read);
        if(mark != infinite_blanks)
            fseek(read, -MARKSIZE, SEEK_CUR);
        return(mark);
    }

cleanup

I mentioned before that C does not give us a way to truncate a file to a particular size. In fact we need a way to truncate an unbounded file to an arbitrarily large size, so POSIX truncate() is no good either because off_t is finite and our file position indicator is unknowable.

To clean up the garbage, we copy the known region of the tape to a replacement file, in an incremental streaming manner. We read backwards towards the start of the file and write forwards extending the replacement file, which reverses the contents of the tape. So we have to copy each file twice.

    static void copy_and_reverse(FILE *read, FILE *write) {
        enum mark mark = infinite_blanks;
        do mark = read_write(read, write, mark);
        while(mark != infinite_blanks);
    }

    static void cleanup(FILE *fp, const char *name) {
        FILE *tmp = tmpfile();
        copy_and_reverse(fp, tmp);
        fclose(fp);
        fp = fopen(name, "wb"); // make it empty
        copy_and_reverse(tmp, fp);
        fclose(fp);
        fclose(tmp);
    }

done!

I have omitted any error checking from this code, since it’s a theoretical model, and in theory it cannot possibly go wrong.