[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The Ramsey sweep



Some people have asked me about Norman Ramsey's technique for doing Mark&Sweep
in time proportional to the amount of live data, not proportional to the
total heap size.

This is a clever idea, with useful applicability in the "lightweight
languages" community, so it is worth spreading around. Tell your friends.
I'll describe it below; you should probably check Norman's web page at
harvard or send him email for for more details. (Norman's web page, btw,
is full of interesting bits and pieces.)

Norman's key idea is punting the freelist.

To keep things simple, let's assume that we are allocating only cons cells --
2-word blocks of memory. Since this is mark&sweep, let's assume we keep a
separate table/bitvector of mark bits, one bit for each cons cell in the heap.
(There are other ways, but let's fix on this one.)

Now, suppose we need to GC. Our heap size is H words, and at this moment we
have L words of live data in it.

We search the graph of live data, and mark all the live cells in the mark
bitvector. This costs O(L) work -- we only look at the live data as we search,
of course.

In a traditional mark&sweep algorithm, we would then sweep the heap, clearing
mark bits and linking the free cons cells into a free list. This would cost
O(H) work -- bad idea, if we have a big heap and only a (proportionally) small
amount of live data.

Instead, we *don't* sweep. Rather than keep a free list, we initialise an
"allocation pointer" ap to point to the first word in the heap. Then we
immediately return from the collector to continue execution of the main
program.

When the main computation needs a cons cell, we start scanning ap forward 
through the heap, checking the marks in the bitvector. 

  - If we find a marked cons cell, we can't allocate it (it's live, right?),
    so we skip past it. BUT... as we skip past it, we clear the mark.

    Even though we're doing this work at allocation time, let's charge this
    work up to the previous collection. How many clear&skip operations will
    we do? L of them -- one for each live cons cell. O(L) work. Life is good.

  - If we find an *unmarked* cons cell, we've found a free cell we can
    use for our allocation. So we stop the scan, leave ap pointing at
    this cell, and return it to the program.

    Let's charge this step to "allocation." How many of these operations will
    we do until we fill up the heap and have to GC again? We'll do H-L of them
    -- that's how many unmarked, free cons cells there are.

    But O(H-L) is exactly what the allocator would be paying in the traditional
    scheme, when it's grabbing cells off a free list.

When ap reaches the end of the heap, we've (1) filled up the heap (so we
need to GC again) and (2) we're all set up to do the GC, because we've
completely cleared out the mark vector. Rinse. Repeat.

So, to collect the heap and then allocate all the free space in it costs
  - O(L) for collection
  - O(H-L) for allocation
which is *exactly* the costs of stop&copy.

Hence, mark&sweep people no longer have to cower in shame & humiliation when
stop&copy folks kick sand in their faces, algorithmically speaking.

Now, mark&sweep still has downsides. Stop&copy still has the beautiful
property that it has a blindingly fast allocate&initialise operation, and it
handles requests that *vary in size* very, very well. The awkward part of
mark&sweep is handling requests that vary in size -- this request needs 47
bytes, that request needs 805 bytes. If you handle this in a stop&copy
collector by having n different heaps, where heap i holds objects of size 2^i,
and you round up requests to the nearest power of two, you get (at worst) a
50% space hit.
  - But, hey, stop&copy *guarantees* a 50% space hit, remember? You have
    to split your heap into two semi-spaces.
  - But, actually, you only need both of those semi-spaces *while you collect.*
    You can hand the memory used for one of them back to the OS when you're
    done collecting. So it's a brief, peak demand... whereas stop&copy's
    internal fragmentation is always with you.
  - The big lose with the n heaps holding power-of-two-sized-blocks is that 
    if *any* of your n heaps runs out, you have to kick off a collection.
    That is, your cons-cell heap fills up, but you had tons of space available
    in the heaps that hold the 4, 8, 16, ..., 1Mb blocks. Bummer. With 
    stop&copy, you only have one heap, and you don't have to collect until
    you've handed out every byte of it.
  - Locality issues figure in here, too. 

So... there are cases where m&s could really work out great, and some 
cases that would be more dubious.

M&S would work out fine if you are managing a heap of objects that are 
*all the same size*. It would be a good thing if you had truly fixed
memory, and not much of it. (Perhaps a Palm Pilot...)

The key point of the Ramsey sweep is wiggling out of that painful O(H)
collection step by combining the unmarking and allocation into a single sweep.
That's very clever -- it completely changes the complexity of the algorithm.
People have been messing with mark&sweep for going on forty years without
figuring out that trick.

Norman Ramsey is a Jedi hacker.
    -Olin