sunfishcode's blog
A blog by sunfishcode


Introducing cap-std, a capability-based version of the Rust standard library

Posted on

Introducing cap-std

cap-std is a project to create capability-based versions of Rust standard library and related APIs.

Capability-based here means that the APIs don't access files, directories, network addresses, clocks, or other external resources implicitly, but instead operate on handles that are explicitly passed in. This helps programs that work with potentially malicious content avoid accidentally accessing resources other than they intend, and does so without the need of a traditional process-wide sandbox, so it can be easily embedded in larger applications.

Background

Some of the most devious software bugs are those where the code looks like it does one thing, and usually does that thing in practice, but sometimes, under special circumstances, does something else. Here's a simple example using Rust's filesystem APIs:

    fn hello(name: &Path) -> Result<()> {
        let tmp = tempdir()?;
        fs::write(tmp.path().join(name), "hello world")?;
    }

The expected behavior of this function is to write "hello world" to a file within a temporary directory. The code looks like it will do this. And indeed, it will usually do this. But if the path passed in is ../../home/me/.ssh/id_dsa.pub, then the behavior of this function could be to corrupt the user's ssh public key 😲. That's... not remotely within what we said the expected behavior is. It usually doesn't do that, but under the right circumstances, it could.

And since name is just a string, if the string is computed in a way that could be influenced by an attacker, the right circumstances could easily be made to occur in practice.

The cap-std project provides Rust crates with lightweight ways to avoid such problems. In particular, the cap-std crate's Dir type represents a directory, with methods corresponding to Rust's std::fs functions, for opening and working with files within the directory, that ensure that all paths stay within that directory. For networking, the Pool type represents a set of network addresses, ensuring that all network accesses made through the API are to addresses in the pool.

In contrast to conventional sandboxing, cap-std doesn't have any global state, so using it in one part of an application doesn't require using it in the rest of the application. Library crates can use cap-std internally without imposing any sandboxing constraints on their users.

What can Dir do?

cap_std is just a library, so by itself, it isn't a sandbox for arbitrary Rust code—it can't prevent arbitrary Rust code from using std::fs's path-oriented APIs. Instead, it protects against malicious content, when filesystem paths can be influenced by untrusted inputs, and malicious concurrent modifications, when another program running at the same time has the ability to remove, rename, or create files, directories, symlinks, or hard links in ways that could cause a program to inadvertently access unintended resources.

Revisiting our example above, with cap-std we might write:

    fn hello(name: &Path, tmp: &Dir) -> Result<()> {
        tmp.write(name, "hello world")?;
    }

In this code, if the passed-in path uses .. to access directories outside of the one passed in, tmp.write returns an error.

One key difference from before is that instead of creating the temporary directory itself, this function requests a directory be passed in. The Dir type here serves as a "vocabulary" type, allowing the function to declare that it wants a directory to be passed in, and that it intends to access resources within that directory, rather than accessing arbitrary locations in the filesystem. These kinds of declarations can help reduce the reasoning footprint of a function call.

The Dir crate also makes it much easier to write this code robustly. There's no need to think about .. or absolute paths at the application level, and no need to handle symlinks specially, which with the Rust standard library today isn't even possible to do robustly without platform-specific code.

It also gives callers increased control. The caller gets to choose how and where to create the directory, and when to remove it. Callers could choose to use something like cap-tempfile's tempdir function to easily create a temporary directory in a conventional location and automatically remove it afterwords, however they could also opt to create the directory somewhere else and manage it manually.

Note that Dir is passed by immutable reference, even though it's being used to mutate external filesystem state. This follows Rust's conventions, for example in std::fs::File::set_len, and it reflects an underlying truth about filesystems. &mut in Rust is sometimes called an "exclusive" reference, because when someone has a &mut, they're the only one which can access the underlying object. However, this is generally not a safe assumption when working with filesystem objects, because other programs could concurrently access or even mutate files or directories without Rust's type system having any say in the matter. Consequently, it makes sense to think of filesystem state as being external to the program, with File and Dir objects being just handles that are themselves typically immutable.

Dir can also be combined with other security techniques. In a project which is written to carefully avoid using untrusted paths, it can add an extra layer of defense in depth.

And in the Wasmtime project, the next step we described earlier is now finished, and we're now using cap-std in combination with with our WebAssembly sandbox to implement WASI, providing sandboxed access to system resources.

A simple example

The main pattern for filesystem operations using the cap-std crate is to obtain a Dir and use methods on it, which closely resemble the functions in std::fs.

One of the ways to obtain a Dir is to use the cap-directories crate to request a Dir for a standard directory (similar to the directories-next crate, but returns a Dir instead of a Path). For example, to obtain the data directory for an example program:

    let project_dirs =
        cap_directories::ProjectDirs::from(
            "com.example",
            "Example Organization",
            "`cap-std` Key-Value CLI Example",
            cap_directories::ambient_authority()
        )
    };

    let data_dir = project_dirs.data_dir().unwrap();

Then in place of fs::read and fs::write to read and write files, one can use data_dir here to do data_dir.read(key) and data_dir.write(file_name, value).

Note the use of the ambient_authority() function here, which is a no-op that returns an instance of the opaque AmbientAuthority type, and serves to mark a place in the code where ambient authority is being invoked. cap-directories, and related crates have an overall invariant that functions don't create their own absolute filesystem paths, and always rely on resources being passed in as handles. Functions which don't uphold this invariant, such as cap_directories::ProjectDirs::from, take an AmbientAuthority argument to advertise their ability to open resources given only a string.

This makes it easy to search a codebase to find all the places where a non-sandboxed cap-std API is being used. It can also be scanned for with Clippy using a clippy configuration file.

To see all this put together in a complete example, see the kv-cli example in the cap-std repository. This program implements a simple key-value store, using filesystem paths as keys, and using cap-std ensures that it only accesses paths within its own data directory. Attempts to escape the directory with .. fail gracefully. This is true even if a concurrently running program renames directories on the path or changes symlinks—something that's very hard to get right using std::fs APIs.

$ cargo run --quiet --example kv-cli color green
$ cargo run --quiet --example kv-cli color
green
$ cargo run --quiet --example kv-cli temperature cold
$ cargo run --quiet --example kv-cli temperature
cold
$ cargo run --quiet --example kv-cli color
green
$ cargo run --quiet --example kv-cli /etc/passwd
Error: a path led outside of the filesystem
$ cargo run --quiet --example kv-cli ../../../secret_cookie_recipe.txt
Error: a path led outside of the filesystem

Another useful crate is cap-tempfile, which creates temporary directories and provides a Dir to access them.

It's also possible to create a Dir by opening a raw path, using Dir::open_ambient_dir. Note that this function takes an AmbientAuthority since it does not uphold the sandboxing invariant that the rest of the API does.

A real-world example

Web servers often need to serve files from a given directory, and it'd be nice to have a guarantee that they don't accidentally stray outside that directory.

tide-native-static-files is a fork of a real-world Web server project built on the Tide framework, ported to use cap-std instead of directory paths.

The port is very straightforward, mostly consisting of passing around a Dir instead of a string holding a base directory name. And in many cases, working with a Dir is actually simpler than working with a string. The complete set of changes needed for this port can be seen here.

Implementation Landscape

One of the reasons that Rust doesn't already have a Dir type, when it does have a File type, is that popular OS filesystem APIs don't make this as efficient or idiomatic as just using paths to name directories. However, this is changing.

One of the inspirations for cap-std is the CloudABI project, which among other things developed a technique of using a sequence of openat calls to emulate path lookup in userspace in a way that's robust in the face of concurrent renames. cap-std uses a variant of this technique, optimized to use fewer intermediate system calls, to implement a portable sandboxed path lookup algorithm.

And, Linux recently added a system call, openat2, which has the ability to restrict path lookup so that it stays within a given directory, which is exactly the behavior we want here. It doesn't require a process-wide mode, and it avoids the overhead of doing multiple system calls. cap-std uses this in place of its portable algorithm whenever it can. On systems which support openat2, most functions in the API perform only one or two system calls.

Linux and other operating systems are also exploring adding more such features, and as these features become available, it will become increasingly practical to not just implement a Dir type, but to implement it with WASI-style sandboxing protections built in.

Philosophy

cap-std came about because we were looking to generalize the filesystem sandboxing techniques we were using in Wasmtime's WASI implementation to make them more broadly applicable, and we were particularly inspired by async-std's philosophy:

the best API is the one you already know.

Rust already has a standard library API. It's very good overall, and a lot of care has gone into ensuring that it's implementable on many platforms. It's used by a lot of code, and well known to a lot of developers.

cap-std is an approach that takes advantage of this. Developers who know std can easily learn cap-std. Applications using std can be ported to cap-std, with the main concern being about how to ensure that directory handles are available to all the places that need them, rather than with dealing with differences in the API or in filesystem behavior.

The close alignment between cap-std and std, combined with the close alignment between async-std and std, also make it straightforward to do both at the same time, producing cap-async-std.

If you're familiar with using std::fs, you should be familiar with cap-std's APIs without any surprises. Similarly, if you're familiar with async-std, cap-async-std's APIs should work as expected.

Current status

Cap-std works on Linux, macOS, FreeBSD, Windows, and more, with stable Rust. On Linux, cap_std::fs is optimized to use new system calls including openat2, when available, which significantly reduces the sandboxing overhead.

Support for compiling to WASI is under active development.

Speaking of WASI...

The sandboxing performed by cap-std is the same as what's provided by WASI APIs.

While cap-std is designed so that it can be used as a library within otherwise unsandboxed native applications, WASI applies the same kind of sandboxing to all filesystem accesses, so that it serves as an extension to the core WebAssembly sandbox.

This means that when the cap-std library is compiled for the WASI platform, it will be able to bypass its own sandboxing techniques and simply call into the WASI system calls directly, achieving smaller code size and tighter integration with the underlying WASI platform.

The Future!

We're continuing to add more testing, fuzzing, and optimization. A port to WASI is underway.

We're also starting to think about extending the capability-based model to more parts of Rust's API. The most obvious next step is std::net, which is in a very early state right now, but this is a space we're thinking about for the future! Other areas that may be interesting include std::env for information passed in by the host environment, std::process for launching sandboxed processes, and anything else that allows programs to interact with the outside world.

And as we're doing with cap-directories and cap-tempfile, we're also interested in ways that we can do more than just translate the standard library API into a capability-based model, but also make the capability-based model easy to use.