.@ Tony Finch – blog


I’m writing a simulation, or rather, I’m procrastinating, and this blog post is the result of me going off on a side-track from the main quest.

The simulation involves a bunch of tasks that go through a series of steps with delays in between, and each step can affect some shared state. I want it to run in fake virtual time so that the delays are just administrative updates to variables without any real sleep()ing, and I want to ensure that the mutations happen in the right order.

I thought about doing this by representing each task as an enum State with a big match state to handle each step. But then I thought, isn’t async supposed to be able to write the enum State and match state for me? And then I wondered how much the simulation would be overwhelmed by boilerplate if I wrote it using async.

Rather than digging around for a crate that solves my problem, I thought I would use this as an opportunity to learn a little about lower-level async Rust.

Turns out, if I strip away as much as possible, the boilerplate can fit on one side of a sheet of paper if it is printed at a normal font size. Not too bad!

But I have questions…

async fn-damentals

My starting point was to write:

    async fn deep_thought() -> u32 {
        42
    }

    fn main() {
        deep_thought();
    }

playground

When I call deep_thought() I immediately get a Future<Output = u32>. As the compiler warns, none of the code in deep_thought() runs, it just constructs a value of an ineffable type which contains the initial state of deep_thought()’s state machine.

To actually run it, I need to poll() it. The Future::poll() method has a signature that immediately presents a number of obstacles:

    fn poll(
        self: Pin<&mut Self>,
        ctx: &mut Context<'_>,
    ) -> Poll<Self::Output>

pin a task

Unlike normal Rust data structures, a Future can contain references to itself. (In a Rust function, variables can refer to other variables, and a Future contains (roughly speaking) function activation frames, hence it can be self-referential.) So, whereas normal Rust data types can be moved, a Future must stay at the same address even when it is not borrowed. The Pin type is used to immobilize a Future.

For my purposes it’s easiest to Pin the Future in a Box on the heap. I’ll define a struct Task to wrap the Pinned Future so that I can define a couple of methods on it. (More elaborate async frameworks usually have more layers between their version of Task and its Future.)

This wrapper is generic over the ineffable Fut type and its ultimate return type Out (which for deep_thought() is u32).

    struct Task<Fut> {
        future: Pin<Box<Fut>>,
    }

    impl<Fut, Out> Task<Fut>
    where
        Fut: Future<Output = Out>,
    {
        fn spawn(future: Fut) -> Self {
            let future = Box::pin(future);
            return Task { future };
        }
    }

Constructing a Task looks like,

        let mut task = Task::spawn(deep_thought());

noop context

The second argument to poll() is a Context, which is a wrapper around a Waker.

The simplest way to make a Context is by using Waker::noop(), which is enough for us to get deep_thought() to actually run. As in the fake-time simulation that I am procrastinating, 7.5 million years pass in the blink of an eye.

        let mut ctx = Context::from_waker(Waker::noop());

        match task.future.as_mut().poll(&mut ctx) {
            Poll::Pending => {
                todo!();
            }
            Poll::Ready(answer) => {
                println!("the answer is {answer}");
            }
        }

playground

primops, generally

An async function can call another async function, and (in async code just like in normal code) the called async function does nothing but return a Future. To make it do something, the caller needs to .await the Future. Under the covers .await compiles down to poll()ing the Future.

A chain of async .await calls bottoms out in a primitive operation that interacts with the outside world. A primitive async operation is an impl Future state machine data structure, written manually instead of relying on compiler trickery.

Typically, a primitive Future will be poll()ed twice:

A primitive Future can implement a more complicated state machine that needs to be poll()ed more, but twice is the minimum necessary to actually suspend a Task.

primops, minimally

Continuing my approach of doing the least possible thing to illustrate a point, here’s a stub Future that pretends to sleep. Its trivial state machine is encoded in the delay value: if it’s zero, the Future continues without suspending; if it’s non-zero, the Future suspends, but first resets the delay so that next time it will continue.

    struct Sleep(u32);

    impl Future for Sleep {
        type Output = ();
        fn poll(
            mut self: Pin<&mut Self>,
            _: &mut Context<'_>
        ) -> Poll<()> {
            if self.0 > 0 {
                self.0 = 0;
                return Poll::Pending;
            } else {
                return Poll::Ready(());
            }
        }
    }

As an example of using it, deep_thought() can pretend to spend a long time by constructing a Sleep() object (which is our minimal state machine) then .await it to invoke poll().

    async fn deep_thought() -> u32 {
        Sleep(7_500_000).await;
        42
    }

And the main loop now needs to poll() the Task twice to run it to completion.

        loop {
            match task.future.as_mut().poll(&mut ctx) {
                Poll::Pending => {
                    println!("sleeping for 7.5 million years...");
                }
                Poll::Ready(answer) => {
                    println!("the answer is {answer}");
                    return;
                }
            }
        }

playground

contexts and wakers

In that minimal proof-of-concept, the fake Sleep primitive does not actually do anything other than suspend the Task, and the top-level async executor loop blithely assumes it knows why the Task was suspended.

The purpose of the Context and its inner Waker is to allow a primitive Future to communicate with the async executor loop: to arrange for the operation to happen, and suspend the Task while the operation proceeds.

So for my fake Sleep to account for the passing of fake time, I need to construct my own Waker that does something more useful than Waker::noop().

I believe the design intent is that a Waker is roughly speaking a wrapper round a smart pointer that refers to the current Task. When a primitive Future suspends a task, it stashes the Waker with the operation in progress. When the operation completes, the Waker is told to wake() its Task, which puts it back on the async executor’s loop to be poll()ed.

To make a Waker, I need to make a RawWaker:

    pub const unsafe fn Waker::from_raw(
        waker: RawWaker
    ) -> Waker;

    pub const fn RawWaker::new(
        data: *const (),
        vtable: &'static RawWakerVTable
    ) -> RawWaker;

This is dismaying, it’s like hand-rolled object-oriented C. Instead of a type-safe dyn Trait, I have to cruft something together from a raw pointer, a list of functions in a struct, and unsafe code.

At this point I got stuck, despondently trying to work out how my Tasks and executor loop should refer to each other, and what kind of smart pointer I can smuggle through a raw *const() pointer. Eventually I realised there’s a simpler way.

primops, commandingly

There are a couple of ways that a primitive Future can arrange for an operation to happen:

In imaginary safe Rust, a Task can return a Command roughly as follows:

        let mut cmd = Command::Run;
        let p = task.future.as_mut().poll(&mut cmd);
        fn poll(
            mut self: Pin<&mut Self>,
            cmd: &mut Command
        ) -> Poll<()> {
            *cmd = Command::Example;
            return Poll::Pending;
        }

In real Rust I need to smuggle the borrowed &mut cmd through the RawWaker’s raw *const() pointer.

Since I’m not using the Waker to revive the Task when its operation completes, I can reuse the RawWakerVTable from Waker::noop().

primops, yieldingly

I’ll define a Yield type that combines the primitive commands and Poll::Ready() in one enum, and I’ll fix the top-level task’s return type to Future<Output = ()>. (Too much boilerplate is needed to keep the Output type generic.)

Yield” has a dual meaning: the result returned (yielded) from an activity; and the task relinquishing (yielding) the CPU.

    #[derive(Copy, Clone, Debug)]
    enum Yield {
        Run,
        Sleep(u32),
        // maybe other commands here
        Done(),
    }

The async executor loop calls poll() on a Task, which creates a place-holder Yield and stashes a pointer to it in a fresh Context.

    impl<Fut> Task<Fut>
    where
        Fut: Future<Output = ()>,
    {
        fn poll(&mut self) -> Yield {
            let mut yld = Yield::Run;
            let data = &mut yld as *mut Yield as *const ();
            let vtable = Waker::noop().vtable();
            let waker = unsafe { Waker::new(data, vtable) };
            let mut ctx = Context::from_waker(&waker);
            match self.future.as_mut().poll(&mut ctx) {
                Poll::Pending => yld,
                Poll::Ready(()) => Yield::Done(),
            }
        }
    }

The Yield type is also used as the direct representation of a primitive Future. An async function constructs a Yeild and .awaits it, which causes the Yield to be returned via the Context to the async executor’s loop. Before suspending, the Future Yield is reset to Yield::Run so that execution continues the next time the Task is poll()ed. (Analogous to resetting the Sleep delay to zero in the previous example.)

    impl Future for Yield {
        type Output = ();
        fn poll(
            mut self: Pin<&mut Self>,
            ctx: &mut Context<'_>,
        ) -> Poll<Self::Output> {
            if let Yield::Run = *self {
                return Poll::Ready(());
            } else {
                let yld = ctx.waker().data() as *mut Yield;
                let yld = unsafe { yld.as_mut().unwrap() };
                *yld = *self;
                *self = Yield::Run;
                return Poll::Pending;
            }
        }
    }

There’s more discussion of the unsafe code below.

fake sleep in action

The async executor loop needs to carry out the commands Yielded by its tasks. The classic data structure for timers is a min-heap keyed on the wake-up time; fake time is just a normal timer queue without any actual sleeping or delays between wake-up times.

After augmenting my Task type with a wake-up time, I can write my main program roughly like this sketch:

        let mut tasks = BinaryHeap::new();
        for i in 1..=TASKS {
            tasks.push(Task::spawn(activity(i, LIMIT)));
        }

        while let Some(mut task) = tasks.pop() {
            match task.poll() {
                Yield::Sleep(delay) => {
                    task.wake_up += delay;
                    tasks.push(task);
                }
                Yield::Done() => {
                    // drop completed task
                }
                yld => panic!("unexpected {yld:?}"),
            }
        }

The main program spawns some activity()s that sleep in a loop for differing amounts of time. They report their progress to stdout. For this demo I want the tasks to print synchronously (no async IO!) to illustrate the progress of their state machines.

This demo is greatly simplified but roughly the same shape as the fake-time simulation that I’m procrastinating.

    async fn activity(delay: u32, stop: u32) {
        let mut now = 0;
        println!("{now} {delay} start");
        loop {
            Yield::Sleep(delay).await;
            now += delay;
            if now < stop {
                println!("{now} {delay} continue");
                continue;
            } else {
                println!("{now} {delay} return");
                return;
            }
        }
    }

You can see the complete demo in action at the Rust playground. Task 1 wakes up every tick, task 2 every other tick, etc.

questions

I don’t know why a Waker isn’t just an abstract generic type parameter with some trait bounds, so that I could define it using safe code. As far as I can tell the language and the standard library don’t depend on its exact shape, so I would expect the details to be punted to async runtime libraries. I guess there’s something I’m missing that requires the standard library to partially restrict the shape of a Waker.

There are some weaknesses in my unsafe code. Miri says the code is OK, which agrees with my handwavy correctness argument by analogy with a mutable borrow. However I’m not certain that the compiler is guaranteed to know that yld can be mutated by poll().

An alternative might be to return the mutated Yield from Task::poll() by reconstructing the &mut Yield reference from the Context in the same manner as Yield::poll(). But then I’m not certain the compiler will know that the borrowed yld needs to live all the way to the end of the function.

For now I’ve chosen the shorter code.

Having learned how to do it myself, I’m curious to hear of crates that already solve this problem.