Tech —

Mac OS X 10.6 Snow Leopard: the Ars Technica review

No new features.

Grand Central Dispatch

Apple's GCD branding: <a href="http://en.wikipedia.org/wiki/Foamer">Railfan</a> <a href="http://en.wikipedia.org/wiki/Fan_service">service</a>
Apple's GCD branding: Railfan service

Snow Leopard's answer to the concurrency conundrum is called Grand Central Dispatch (GCD). As with QuickTime X, the name is extremely apt, though this is not entirely clear until you understand the technology.

The first thing to know about GCD is that it's not a new Cocoa framework or similar special-purpose frill off to the side. It's a plain C library baked into the lowest levels of Mac OS X. (It's in libSystem, which incorporates libc and the other code that sits at the very bottom of userspace.)

There's no need to link in a new library to use GCD in your program. Just #include <dispatch/dispatch.h> and you're off to the races. The fact that GCD is a C library means that it can be used from all of the C-derived languages supported on Mac OS X: Objective-C, C++, and Objective-C++.

Queues and threads

GCD is built on a few simple entities. Let's start with queues. A queue in GCD is just what it sounds like. Tasks are enqueued, and then dequeued in FIFO order. (That's "First In, First Out," just like the checkout line at the supermarket, for those who don't know and don't want to follow the link.) Dequeuing the task means handing it off to a thread where it will execute and do its actual work.

Though GCD queues will hand tasks off to threads in FIFO order, several tasks from the same queue may be running in parallel at any given time. This animation demonstrates.

A Grand Central Dispatch queue in action

You'll notice that Task B completed before Task A. Though dequeuing is FIFO, task completion is not. Also note that even though there were three tasks enqueued, only two threads were used. This is an important feature of GCD which we'll discuss shortly.

But first, let's look at the other kind of queue. A serial queue works just like a normal queue, except that it only executes one task at a time. That means task completion in a serial queue is also FIFO. Serial queues can be created explicitly, just like normal queues, but each application also has an implicit "main queue" which is a serial queue that runs on the main thread.

The animation above shows threads appearing as work needs to be done, and disappearing as they're no longer needed. Where do these threads come from and where do they go when they're done? GCD maintains a global pool of threads which it hands out to queues as they're needed. When a queue has no more pending tasks to run on a thread, the thread goes back into the pool.

This is an extremely important aspect of GCD's design. Perhaps surprisingly, one of the most difficult parts of extracting maximum performance using traditional, manually managed threads is figuring out exactly how many threads to create. Too few, and you risk leaving hardware idle. Too many, and you start to spend a significant amount of time simply shuffling threads in and out of the available processor cores.

Let's say a program has a problem that can be split into eight separate, independent units of work. If this program then creates four threads on an eight-core machine, is this an example of creating too many or too few threads? Trick question! The answer is that it depends on what else is happening on the system.

If six of the eight cores are totally saturated doing some other work, then creating four threads will just require the OS to waste time rotating those four threads through the two available cores. But wait, what if the process that was saturating those six cores finishes? Now there are eight available cores but only four threads, leaving half the cores idle.

With the exception of programs that can reasonably expect to have the entire machine to themselves when they run, there's no way for a programmer to know ahead of time exactly how many threads he should create. Of the available cores on a particular machine, how many are in use? If more become available, how will my program know?

The bottom line is that the optimal number of threads to put in flight at any given time is best determined by a single, globally aware entity. In Snow Leopard, that entity is GCD. It will keep zero threads in its pool if there are no queues that have tasks to run. As tasks are dequeued, GCD will create and dole out threads in a way that optimizes the use of the available hardware. GCD knows how many cores the system has, and it knows how many threads are currently executing tasks. When a queue no longer needs a thread, it's returned to the pool where GCD can hand it out to another queue that has a task ready to be dequeued.

There are further optimizations inherent in this scheme. In Mac OS X, threads are relatively heavyweight. Each thread maintains its own set of register values, stack pointer, and program counter, plus kernel data structures tracking its security credentials, scheduling priority, set of pending signals and signal masks, etc. It all adds up to over 512 KB of overhead per thread. Create a thousand threads and you've just burned about a half a gigabyte of memory and kernel resources on overhead alone, before even considering the actual data within each thread.

Compare a thread's 512 KB of baggage with GCD queues which have a mere 256 bytes of overhead. Queues are very lightweight, and developers are encouraged to create as many of them as they need—thousands, even. In the earlier animation, when the queue was given two threads to process its three tasks, it executed two tasks on one of the threads. Not only are threads heavyweight in terms of memory overhead, they're also relatively costly to create. Creating a new thread for each task would be the worst possible scenario. Every time GCD can use a thread to execute more than one task, it's a win for overall system efficiency.

Remember the problem of the programmer trying to figure out how many threads to create? Using GCD, he doesn't have to worry about that at all. Instead, he can concentrate entirely on the optimal concurrency of his algorithm in the abstract. If the best-case scenario for his problem would use 500 concurrent tasks, then he can go ahead and create 500 GCD queues and distribute his work among them. GCD will figure out how many actual threads to create to do the work. Furthermore it will adjust the number of threads dynamically as the conditions on the system change.

But perhaps most importantly, as new hardware is released with more and more CPU cores, the programmer does not need to change his application at all. Thanks to GCD, it will transparently take advantage of any and all available computing resources, up to—but not past!—the optimal amount of concurrency as originally defined by the programmer when he chose how many queues to create.

But wait, there's more! GCD queues can actually be arranged in arbitrarily complex directed acyclic graphs. (Actually, they can be cyclic too, but then the behavior is undefined. Don't do that.) Queue hierarchies can be used to funnel tasks from disparate subsystems into a narrower set of centrally controlled queues, or to force a set of normal queues to delegate to a serial queue, effectively serializing them all indirectly.

There are also several levels of priority for queues, dictating how often and with what urgency threads are distributed to them from the pool. Queues can be suspended, resumed, and cancelled. Queues can also be grouped, allowing all tasks distributed to the group to be tracked and accounted for as a unit.

Overall, GCD's use of queues and threads forms a simple, elegant, but also extremely pragmatic architecture.

Channel Ars Technica