ActuallyUsingCrev

Actually Using Crev, Or, The Problem Of Trusting Software Dependencies

Up to date as of August 2019.

The Problem

As anyone who ever tried to build a program on Windows using SDL and gcc in 2004 can tell you… building software with dependencies sucks. C and C++ make this especially hard because between the compilation+linking model, header files, macros, and conditional compilation, it’s basically impossible to start with a pile of bare source code and know how the correct magic compiler invocations to get a functional program out the other end. Sometimes you can figure it out, sometimes you can’t, and sometimes you’re trying to build X11 and you really can’t. So, we have build systems like make and cmake and meson and so on, and they’re all basically giant garbage fires but they’re still absolutely necessary. And all they REALLY do is try to provide the information that the compiler and linker need to actually compile and link a program.

The traditional solution to this problem is to be using Unix, which has some assumptions and conventions for where to put C libraries and how things fit together. All libraries are installed system-wide, you put them in certain places that the compiler knows to look for, give them particular names, and so on. This also means that you have ONE copy of each library, EVERYTHING is built with the same compiler and the same versions of libraries, and if you want to build a program that departs from these assumptions you’re on your own to make it work. Putting all these pieces together and making them work is basically what Linux distros and BSD ports maintainers do, and it’s a big task. Each program and library gets treated as its own special case as necessary, and people write patches and build scripts to massage things into working. Thousands of them, all written and maintained by hand.

This usually works fine for end-users who just want to use existing software, but it sucks for developers because the pipeline between “someone writes code” and “you use the code” is pretty long, and is designed for stability rather than speed. If the distro doesn’t provide a library, or doesn’t provide the version of it you want, then you package it yourself and wait for it to be accepted for all the platforms you care about – thank you, your groundbreaking new software system will be included in the next Debian Stable release in about 2.5 years, give or take. Or you “vendor” the dependency and include it into your source distribution and build process and put the work in to upgrading it as necessary and just Deal With It. This is a horrible solution that defeats a lot of the purpose of using libraries to begin with, but dealing with libraries in C/C++ in general is so dysfunctional that there is an entire genre of libraries designed to be vendored into their user’s projects.

So, if you want to write C/C++, your choices are

  • Write only for very specific platforms (this program runs best on Internet Explorer 6.0 Ubuntu 16.04), or
  • Deal with a VERY SLOW, labor-intensive, and pretty complicated software distribution process, or
  • Put tons of work into supporting someone else’s code as part of your own code, or
  • Use the most minimal dependencies possible, or
  • Just avoid using other people’s code altogether.

This is one of the biggest and most pervasive hidden costs of writing C/C++. Forget memory safety, the compilation and distribution model is outright criminal.

The Solution

More or less starting with Java, as far as I’m aware, people got sick of this shit and started explicitly designing their (mainstream) programming languages to include all the information needed to build a source file in the source file itself. If your file says import java.util.HashMap; then the compiler knows where to look to find the code for java.util.HashMap, and knows that it needs to link it into your program. If that has any other dependencies it mentions them in the same way, and the compiler can search for them too, until it has the full dependency tree for your program. There’s some naming conventions and language-specific variations and such, and it’s still not a trivial problem to design and implement, but it works. Large or complex programs sometimes still need some sort of customized configuration, and there’s various systems like ant that provide that, but mostly you can type javac Foo.java and Foo.java has enough information in it for javac to do everything it needs to do. make can go die in a fire.

So what if some of those modules aren’t present? Well, go online and search for them! Java again was weirdly prescient in its URL-based naming convention for modules, but also as usual basically missed the target compared to the path technology actually took because there was no built-in way to go from import li.alopex.code.Foo; to “download http://code.alopex.li/Foo.java”. Other languages invented this first: you had CTAN and CPAN for TeX and Perl, which were basically just FTP sites for source packages with a few added conventions and tools. Then you got things like pypi and rubygems for Python and Ruby, which were special-purpose versions of a Linux distribution’s repository that moved faster and had fewer controls, which mostly worked fine because the software didn’t need as much massaging to work together. These managed system-wide packages for you, and this sometimes caused clashes with the versions of libraries already installed as part of linux distros, and people eventually made tools like virtualenv to try to work around these problems, but generally it was still a big improvement for those people who wanted or needed the cutting edge. Then at some point someone said “why don’t we make it so there’s NO system-wide packages, you don’t need to make two different programs depend on the same version of a library, you make each program build with its own totally independent dependency tree and you just have to specify what dependencies you want”, and we made npm for JS and the go toolchain for Go… and the world heckin’ EXPLODED.

Suddenly using a dependency is basically as easy as writing it down, each program can use whatever it wanted with no fear at all of other programs stomping on it or making life difficult. Even better, all the infrastructure is run for free by third parties (and honestly pretty cheap, I’m told), so publishing a library is as easy as throwing a license and version number on it. (Well, and solving the hardest problem in programming: coming up with a good name.) This is awesome. Sure you spend a bit more time downloading and building dependencies than with the old Unix model of system-wide libraries, but our software and hardware can handle it, everything’s connected to the internet anyway, source code is small, and the extra five seconds it adds to a clean system build is WAY cheaper than the hours of programmer time wasted on dicking around with cmake. It can’t be beat, and we’re basically at the point where no new programming language can be taken seriously if it doesn’t have a system like this.

The Problem, Again

So suddenlydistributing new code can move a lot faster, and reach a lot further with a lot less effort. This makes developing software way easier. Of course this includes developing malicious software as well. These giant package repositories are a single point of failure that everyone uses, and in fact it’s very difficult to not use them. This is naturally, an awesome attack vector, and such attacks are now not uncommon.

The reasons for this are many, and all go back into “it’s easy to use libraries, so people use them”.

  • Because these libraries are so easy to use, they get used by lots of people, and people build tools and workflows on the assumption that they work. Because of this, all it takes is one compromise of a fairly popular library to infect thousands of machines.
  • Because libraries are easy to use, you also get deeper dependency trees that have more hidden churn going on under the hood.
  • Because it’s so easy to update a dependency (especially if the dependency follows Semver) we have suddenly gone from “upgrade your OS every two years when a new LTS release is made” to “I guess it’s Tuesday, might as well do cargo update to see if anything’s changed”.
  • Because libraries have become so easy to create, you get more of them out there that are maintained by one or two people part time, or not at all, instead of having something like The GNOME Foundation backing giant chunks of the ecosystem with dedicated and experienced (if still often volunteer) developers.

So there’s an increasing number of attacks, since it’s easy and profitable and pretty low-risk. Because there’s lots of eyes on the software, these attacks often get discovered pretty quickly in absolute terms – for instance, in the most recent one there was about a week between the exploit and discovery. Try getting that turn-around time out of a commercial support contract; when a CVE is reported the embargo time between “attack is discovered” and “attack is announced publicly” is usually months. But because these systems are widely used, largely automated (and thus obscured), and trusted by default, a week translates to over a thousand downloads of a compromised library, and if sofware built with the compromised library is distributed to other users it could add up to much more than that.

The Solution, Again?

Nobody actually knows how to fix this. This ecosystem is a new thing in the world, and is still evolving fast. But people are looking for ways to fix it. Personally I think that a lot of it can be tackled by better, more transparent analytics to make it easier to understand what is actually going on in your dependency tree. More information may lead to better judgements made by developers, and hopefully automatic detection of problems or potential problems such as unmaintained or poor-quality packages.

Another approach is to, instead of removing the “obscured” part of the equation, remove the “trusted by default” part. The main effort I’m aware of in this field is crev, which is appealingly simple in concept: People do code reviews of specific packages, summarize their results, and sign them with a crypto key that proves the review comes from a particular source. It’s simple, very broad, and so is hopefully easy to get going and actually use.

crev is a very human-centric system, and a very minimal one – there’s no accounts or real-world identity system or such attached to it, all you can prove is that all reviews made by the same ID were made by people with access to the private key behind it. It could be a person, an institution, one of a person’s dozen alts, or whatever. But the reviewer is staking a reputation of some kind or another on it, and a key without a good reputation isn’t going to matter much, so while someone could shotgun millions of false reviews out there, they’re about as likely to be taken seriously as spam emails or bot-generated Twitter posts. (These are not solved problems either, but can at least be kept down to a dull roar.) The actual value of this model comes from being an incarnation of the human social webs that already exist – I personally know svenstaro, I know how good their work is, and so if I know that an ID is theirs and they review some stuff then it’s a reasonably trustworthy opinion. I choose who I actually trust, or don’t trust, which isn’t a perfect model but that’s how human trust always works. “Perfect” guarentees of trust such as those provided by cryptocurrency are inflexible. Systems that people actually use need some wiggle room to operate well.

Will this web of trust model work? I don’t know. It didn’t really for PGP, but PGP has plenty of other problems as well. Could it work for crev? I don’t see why not. In human systems, all security comes down to trusting individual people. That’s why getting cheated by a friend hurts so much. We’re just building tools around the social dynamics that already exist.

So, let’s try it out and see how it goes.

Actually Using Crev

Okay, all this nonsense has basically been trying to lead up to an actual tutorial for using crev. Currently the only implementation of crev is cargo-crev, which ties into the Rust language package manager, cargo. However, none of this is Rust-specific apart from the implementation, the basic concept and code review format should work for any language or package system. Code reviews (“proofs”) are just YAML files, and they can be shared around however you feel like – the method currently seems to be by putting proofs in git repositories, and cargo-crev has support for this. crev already has a pretty good getting started guide that covers much of the same ground as this, but I wanted to write something similar that comes from a random user, not the system’s creator.

A small use case

First, let’s install the thing. We’re using Linux, obviously, but this shouldn’t be too different on any platform ’cause Rust has a lot of work put into its tooling to make it Just Work. If you haven’t installed Rust by now, do so. Yes, that page tells you to curl a shell script into sh – don’t you trust it? At least it doesn’t need sudo though! Then all you need to do is set up your $PATH to include ~/.cargo/bin/ so that you can find cargo and the programs it builds, and do cargo install cargo-crev to build the latest version of cargo-crev from source. On Debian, you will need to do apt install clang llvm-dev libclang-dev first to get it to build – the hashing library uses some inconvenient C dependencies and getting them to build is a PITA. How topical.

Alternatively, you can just download a pre-built binary from crev’s github release page.

Checking reviews

So there’s two main things that we want to do: Check the code reviews for a crate, and make reviews for some dependencies. As a small test case I’m going to use goatherd, which is a toy pubsub messenger thing I was making. It had the dubious distinction of turning up a nasty race condition in one of its (quite young) dependencies, which got me thinking about this sort of thing in the first place. So once we install cargo-crev we can just cd to its dir and run cargo crev verify:

~/my.src/goatherd $ cargo crev verify
status reviews     downloads    own. issues lines  geiger flgs crate                version         latest_t       
none    0  0   685715   7926719 0/1    0/0    346      60      num_cpus             1.10.1          
none    0  0   837285  15868655 3/3    0/0    875       0 CB   bitflags             1.1.0           
none    0  0    34142  12593931 1/1    0/0  34340      35 CB   syn                  1.0.3           
none    0  0  3829463   7436762 1/2    0/0   1399       0      semver               0.9.0           
none    0  0    48229  12761052 1/3    0/0   9775       0 CB   serde                1.0.99          
none    0  0   120906   4702189 0/1    0/0   2530     291 CB   parking_lot_core     0.6.2           
none    0  0     6030    156927 0/1    0/0    748      68      once_cell            0.2.6           
none    0  0   298180   1712663 2/3    0/0   1786       2 CB   bincode              1.1.4           
none    0  0  2285432   3257460 1/2    0/0    306       0 CB   rand_chacha          0.1.1           
none    0  0  2738292   2819988 1/2    0/0    698      12      rand_isaac           0.1.1           
none    0  0   142889   2292737 0/1    0/0   2159     561      redox_syscall        0.1.56          
none    0  0    33854  11301406 1/1    0/0   1329       0      quote                1.0.2           
none    0  0     2396    220849 0/1    0/0   4364     393      sized-chunks         0.3.1           
none    0  0   963594   4731677 3/6    0/0   2478      16      uuid                 0.7.4           
none    0  0   742319    747528 0/1    0/0    358       2      rdrand               0.4.0           
none    0  0      794    147029 0/1    0/0  11481      75 CB   im                   13.0.0          
none    0  0    87187  17062523 3/4    0/0  58231      37 CB   libc                 0.2.62          
none    0  0   144063   3388430 0/1    0/0   1615     216      lock_api             0.3.1           
none    0  0  1192174   1192526 0/1    0/0   1196      49      cloudabi             0.0.3           
none    0  0  1869540   1903212 0/1    0/0     13       0 CB   winapi-x86_64-pc-windows-gnu 0.4.0           
none    0  0   153833   8206135 2/3    0/0    594      31      rand_core            0.4.2           
none    0  0  1116907   2245501 1/2    0/0    398      10      rand_jitter          0.1.4           
none    0  0    43526   8282886 1/2    0/0   6664       0      serde_derive         1.0.99          
none    0  0   773001   9236283 1/1    0/0   2436     226 CB   byteorder            1.3.2           
none    0  0   700118    739883 0/1    0/0     40       3      fuchsia-cprng        0.1.1           
none    0  0  2767914   2903867 1/2    0/0    326      58      rand_hc              0.1.0           
none    0  0  1261816   9417985 1/1    0/0    104       0      cfg-if               0.1.9           
none    0  0    14491   3298524 0/1    0/0    398       0      autocfg              0.1.6           
none    0  0  2284234   2912350 1/2    0/0    155       6      rand_xorshift        0.1.1           
none    0  0    62430  10353037 2/6    0/0    538       0      unicode-xid          0.2.0           
none    0  0       34       124 0/1    0/0   1129      26      secc                 0.0.9           
none    0  0  1822881   1855264 0/1    0/0     13       0 CB   winapi-i686-pc-windows-gnu 0.4.0           
none    0  0  2087595  18385573 3/4    0/0   6388      19 CB   rand                 0.6.5           
none    0  0  1815094   2945762 1/2    0/0    258      12 CB   rand_pcg             0.1.2           
none    0  0  4475042   4774520 0/1    0/0   1040       0      semver-parser        0.7.0           
none    0  0   739818   5968911 0/3    0/0   1926     342      smallvec             0.6.10          
none    0  0  3153691   5602538 0/1    0/0    265       0      rustc_version        0.2.3           
none    0  0        8       162 0/1    0/0   1289       4      axiom                0.0.7           
none    0  0   276112  13978426 5/5    0/0   2414      32 CB   log                  0.4.8           
none    0  0   599802   5409729 0/2    0/0    264      19      scopeguard           1.0.0           
none    0  0  1806787   2534826 1/2    0/0   1033      43      rand_os              0.1.3           
none    0  0    47917   7598948 2/2    0/0   3478       0 CB   proc-macro2          1.0.1           
none    0  0  2157068   2465807 0/1    0/0   3909      51 CB   typenum              1.10.0          
none    0  0   144187   4728114 0/1    0/0   2742     349 CB   parking_lot          0.9.0           
none    0  0  2266560   8206135 2/3    0/0    516      41      rand_core            0.3.1           
none    0  0  1000256  10062103 0/1    0/0 160451     197 CB   winapi               0.3.7           

Okay, that’s a lot more junk than I expected. Some of it, like crate and version, is pretty obvious. downloads, I assume are the recent and total download numbers from crates.io, and geiger I assume is the result of cargo-geiger, a tool which crawls through Rust code looking for unsafe blocks. The status column is the amount of trust we’ve decided the crate deserves, and reviews, flgs and own. are a bit obscure. Turns out own. is number of owners of the crate on crates.io (known / total; many low-level Rust crates are made by well-known developers like bluss or alexcrichton), and flgs is various flags – CB means “custom build script”, so building the package may run random code. The two columns in reviews are the number of known proofs for the listed version of the crate, and for all versions, respectively. Despite crev’s generally-good docs, finding specifics on these columns is currently unfortunately tricky.

Right now we have absolutely zero proofs/code reviews, because we haven’t imported any. Well, dpc made cargo-crev, so I might as well start with their proofs ’cause I’m already trusting them by using their software. Proofs are just files kept in a git repo, which we can import via the following: cargo crev fetch url https://github.com/dpc/crev-proofs

Run it again, and we get some changes to the reviews column, shown here:

status reviews     downloads    own. issues lines  geiger flgs crate                version         latest_t       
none    1  1   685715   7926719 0/1    0/0    346      60      num_cpus             1.10.1          
none    1  6   739818   5968911 0/3    0/0   1926     342      smallvec             0.6.10          
none    1  1   700118    739883 0/1    0/0     40       3      fuchsia-cprng        0.1.1           
none    0  2   773001   9236283 1/1    0/0   2436     226 CB   byteorder            1.3.2           
none    0  1   276112  13978426 5/5    0/0   2414      32 CB   log                  0.4.8           
none    0  1  2087595  18385573 3/4    0/0   6388      19 CB   rand                 0.6.5           

Yay, now we have some non-zero numbers in the reviews column! The first number is the number of reviews for that particular version of the code, the second is for all versions of the code. Note that even with dpc’s reviews we still don’t trust those crates (their status is none), because we haven’t marked dpc’s ID as trustworthy. We need to create our own ID to mark dpc’s ID as trusted, we’ll get to that in a moment. For now, we can see there’s not a WHOLE lot of crev reviews out there, so let’s focus on changing that.

Sharing reviews

First, we need a git repo called crev-proofs. Well that’s pretty easy, we’ll just make one in our preferred public location, make an empty initial commit in it, and push it. Now we can create a crev ID:

cargo crev new id --url https://git.sr.ht/~icefox/crev-proofs

This asks us for a password, which is the usual “this is forever, you can never recover this” type password you have to use with crypto systems, so make sure you save it in your password safe. You should probably make a copy of the key file it gives you too; maybe print it out and stick it under the cat’s bed or something. You can find a copy of it in ~/.config/crev/ids/<id public key>.yaml; the private key in this file is encrypted with your passphrase, so this file doesn’t need to be kept under lock and key (though it probably wouldn’t hurt).

Now that we have an ID of our own, we can mark dpc’s ID as trusted:

cargo crev trust FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE

It asks us to put in some notes about how trusted this person actually is, and we’re done. Now when we run cargo crev verify, the crates that dpc has reviewed positively are marked as “pass”.

How to find someone else’s ID? In this case I just took it from the docs in the getting started guide, but if you do cargo crev query id all it will show you the ID(s) connected to any git repository you’ve added with cargo crev fetch url. (Note that a repository can contain proofs from more than one ID!)

So by trusting someone’s proofs, by default we are also trusting the people they trust! This is huge. When you mark someone as trusted it is stored in your crev-proofs repo so other people can see it! You can see this by running cargo crev query id trusted, which lists a lot more people than just the one I added:

YWfa4SGgcW87fIT88uCkkrsRgIbWiGOOYmBbA1AtnKA low    https://github.com/oherrala/crev-proofs
Qf4cHJBEoho61fd5zoeweyrFCIZ7Pb5X5ggc5iw4B50 medium https://github.com/kornelski/crev-proofs
aD4K0g6AcSKUDp3VPF7u4hM94zEkqjWeRQwmabLBcV0 medium https://github.com/Mark-Simulacrum/crev-proofs
FBkykBV6YaqAaGoUXyvd-XkEqDYxQNM7EUnZ2nuy-XQ low    https://github.com/Canop/crev-proofs
lr2ldir9XdBsKQkW3YGpRIO2pxhtSucdzf3M5ivfv4A high   https://git.sr.ht/~icefox/crev-proofs
FYlr8YoYGVvDwHQxqEIs89reKKDy-oWisoO0qXXEfHE medium https://github.com/dpc/crev-proofs
X98FCpyv5I7z-xv4u-xMWLsFgb_Y0cG7p5xNFHSjbLA low    https://github.com/kornelski/crev-proofs
ZOm7om6WZyEf3SBmDC69BXs8sc1VPniYx7Nfz2Du6hM low    https://gitlab.com/KonradBorowski/crev-proofs

However, note the only high trust level is the one for my own ID: if dpc trusts kornelski, then I transitively trust kornelski as well, but I never trust dpc’s judgements more than I trust dpc. There’s tons of options in crev for tweaking how exactly you measure “trust”, so if you don’t like the defaults you can tinker to your liking. It all feeds into whether that first column in cargo crev verify is “pass” or not; that is something you can set to your own standards. In the end, you are the one who makes the definition of “trust”! I’m fine with the defaults for now though, so I can pull all the proofs for dpc’s little social circle by running cargo crev fetch trusted, and suddenly there’s more pass’s in my cargo crev verify output.

Creating reviews

Okay, let’s actually review some code! In the process of making goatherd I’m playing with a new library called axiom, and due to the aforementioned race condition issue I’ve already rummaged around inside it, added a couple minor PR’s, and talked with its author a bunch. It’s not a big piece of code, so this was what I wanted to review. We can do that with just cargo crev goto axiom – since cargo downloads a crate’s source code anyway, that command just takes us to a new shell in cargo’s source repo.

This is actually clever, not to say essential, as it makes sure that what we are reviewing what is actually in crates.io, not what is on github or such.

We are just in a subshell, so we can do whatever we want there, browse the code or run tools like cargo geiger or whatever we want. When we’ve seen what we need to, we run cargo crev review in that shell to create a new review for the crate.

However, when I did that it kept saying Error: The digest of the reviewed and freshly downloaded crate were different; dbhwvUPPXHFO7Nn2u29HaOuZtg9CKMExl-5ayu0-itg != RRB0JmuenFACKjFN0CTdO1_MOse-YDhCvjf4Vec1GLs; /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/axiom-0.0.7 != /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/axiom-0.0.7.crev.reviewed. Deleting the axiom source and rebuilding goatherd to fetch a fresh copy doesn’t make this message go away. More on that in a bit!

…for now let’s try a different crate. I went to another crate I make, ggez, and and did cargo crev verify, but it failed with the following message:

    Updating crates.io index     
Error: the lock file /home/icefox/my.src/ggez/ggez/Cargo.lock needs to be updated but --locked was passed to prevent this         

Okay, this is why this is called Actually Using Crev. Deleting my Cargo.lock and rebuilding it didn’t help, but running cargo update in ggez’s dir cleared this message. Thing is, this error message isn’t coming from crev, it’s coming from cargo – it wants to update its package list to the most recent before going through ggez’s dependency tree, but crev calls cargo with the --locked option, which tells it to not touch anything and operate in read-only mode. A sensible precaution in principle, reviewing a crate doesn’t change the dep list you’re trying to review, but here it’s blocking us with a weird error message. Doing the cargo update by hand solves the problem and we can proceed.

Doing cargo crev verify again I got the expected display, and also the following warnings:

Unclean crate approx 0.3.2
Unclean crate crc32fast 1.2.0
Unclean crate either 1.5.2
Unclean crate lazy_static 1.3.0
Unclean crate nodrop 0.1.13
Unclean crate void 1.0.2
Error: Unclean packages detected. Use `cargo crev clean <crate>` to wipe the local source.

Unclean, UNCLEAN! Not sure what’s going on here; it SOUNDS like it is saying that the local source code in the local downloaded version of the crates doesn’t match what cargo thinks it should, which seems slightly impossible ’cause I certainly haven’t been faffing about in ~/.cargo/registry/src/. Still, the suggested command makes this warning go away, so huzzah? Could still be better.

Anyway, this void crate looks weird, and crev says it’s only 70-odd lines of code, so that seems like a good target to review. I do cargo crev goto void and it opens a shell in its cargo source directory again. I poke around and decide it looks fine, then do cargo crev review, and this time it works. I put in my key’s passphrase and it opens my editor with the following template:

# Package Review of void 1.0.2
review:
  thoroughness: low
  understanding: medium
  rating: positive
comment: ""

Plus another 60 lines of comments giving absurdly thorough explanations of what is going on with each of these fields. It’s very helpful, actually. I fill it out, save, exit, and poof it’s done. The proof is saved in my local proof repo and I can view it with cargo crev query review void, at which point I realize that dpc has already reviewed void 1.0.2. Oh well, adding another review to it can’t hurt!

This proof is only saved locally, but is already committed into my git repo of proofs, which is apparently in ~/.config/crev/proofs/<key>/. I can manipulate the git repo with cargo crev git, which just goes to that repo and passes its args through to git, so cargo crev git log and such work as you’d expect. A shortcut for cargo crev git push that automatically goes to the repo associated to your ID is cargo crev publish.

So I run that and, huzzah, it works! Check the actual review out here

I made a few more reviews, ’cause it’s kinda fun. I was looking forward to giving a friend some shit about his code when I hit the error again: The digest of the reviewed and freshly downloaded crate were different; i_bS1vb-271a-02WkBOf7D-yGi-t3fsYJ3kco8FKrNY != v_iKtjR6uB7QFaZ0_sOyYIAcDHidjucAjRGvjXOXqq0; /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/randomize-3.0.0-rc.3 != /home/icefox/.cargo/registry/src/github.com-1ecc6299db9ec823/randomize-3.0.0-rc.3.crev.reviewed. Okay, this is annoying now. It’s saying the hash of the checked-out Cargo source directory is not the same as its copy of it, but I can’t figure out why. So, time to ask for help. Turns out that was a bug in the lists of what files got excluded from the hash, which is now fixed on git master and should be in the next release.

So I think my main conclusion is that crev is great in concept, but the implementation is still young. It mostly works, and you should use it, but you should expect to hit some roadbumps. Response on the issue tracker is pretty quick though!

Discussion

So I see a few possible failure modes for crev as an ecosystem:

  • Nobody uses it – and it dies in obscurity
  • Everyone uses it, poorly – and its data is worthless because it’s so hard to find meaningful stuff
  • Lots of people use it maliciously – ibid

Hopefully none of these things happen! I think that as long as we have people who care about writing good software, we can have people creating good proofs as well. And in the end, as in many things, it all comes down to who you know, which is a very powerful tool. A little bit of work can hopefully lead to the average person having access to a lot of high-quality code reviews, and the “6 Degrees Of Kevin Bacon Paul Erdős” phenomenon will hopefully result in a lot of these social circles being connected together.

I actually kinda want to stress-test this, just to see how resilient the system is in practice. Maybe make a bot that will automatically create reviews in a semi-human-ish method, and see if I can convince people to trust it. I think the underlying idea would be quite similar to the Twitter bots that just spew nonsense and like each other’s stuff to boost follower ratings, and would have a similar goal of making certain statements look far more authoritative and widely accepted than they actually are. You could even make some front websites for security consultants or such and associate the git repos with them for verisimilitude. The crev community is currently small so even a few of these could poison a lot of the well, and even if the community grows bigger it’s easy to spin off more bots. If there’s money in it then people will do it, and I can easily see there being money in saying “Yes, of course, this exploit-ridden code is totally safe. Trust us.”

However, I also see a few possible defenses against this abuse built in to crev:

  • The social network is transitive, but trust is graded. If A trusts B and B trusts C, then A’s trust for C is equal or less than A’s trust for B.
  • Open source tends to be a fairly personal and reputation-driven endeavor. So, people who participate in crev are motivated to be shy about who they trust.
  • Open source coders tend to, well, write a lot of code. There’s not a lot of point in trusting someone’s reviews if they don’t also have projects that use the code they review, or at least someone’s code! That’s going to be hard to fake. People try it, for example creating github accounts full of resume-padding, but it’s pretty damn easy to spot.
  • Coming up with a convincing fake and getting someone important to trust it is a non-trivial amount of work, since you’re trying to fake a pretty savvy audience. I’m sure it’s certainly possible, and will happen, but it’s probably a job that would require spear-phishing rather than brute-force spam. This is the nature of security: you can never remove the possibility of malicious action, but by making it harder and more expensive you reduce the chances of it.

I’m now getting more hypothetical, but afaik these sorts of bot networks to boost reputation are often structured to have bottlenecks in them, so that a few “leader” accounts end up having the authority of lots of other “follower” accounts backing them. So, blocking a few accounts may also render large chunks of a bot network useless at once. Human social networks also tend to follow certain patterns of connection density that result in similar bottlenecks, so if part of the web of trust departs from this structure that’s also possibly evidence that it is artificial and you can have your own bots that look for such things. In crev distrust is as transitive as trust is, so if someone you trust marks an ID as a bad actor you inherit that opinion until you form your own. You can help ensure the integrity of your network by blacklisting people as well as whitelisting them. Security researchers who actually try to find botnets or vulnerable crates can publicize them as untrustworthy, for instance through the already-existing RustSec effort, and then you can follow those to get an up-to-date safety net.

Or, if you just take the whitelist of the most social person you know, and the blacklist of the most paranoid person you know, then you’re probably in good shape! Or maybe the other way around, depending on what you want. But what do you do if two of your trusted sources disagree about a third party? I dunno, it’s an interesting question! What do you do when that happens in real life? :-D

Conclusion

Anyway, back on topic! There’s a few GREAT things that crev has going for it over systems like PGP. First is that it is targeted at programmers, and it uses programmer’s tools. Putting your proofs as plain text in git is brilliant, because everyone already uses plain text in git. Sharing your crev proofs online adds exactly zero infrastructure over what you’re already using, and git does exactly what you need for this role because it’s resilient, easy to manipulate, easy to move between machines, system-independent, and so on. Also unlike PGP (or SSL), it’s trivial to distrust an ID, revise a review, or so on. To err is human, and any human social system that doesn’t allow people to screw up and then fix their mistakes is doomed. But since proofs are plain files in version control, you just update the file and commit it, and that change propagates to everyone that is interested in your opinion the next time they do cargo crev fetch trusted.

Also, it seems like crev should be resilient to compromised keys, and doesn’t have a single point of failure. If I lose my crev key, I can make a new ID, trust the old one, and life is good; people trusting me need to be informed to update, but that’s always gonna happen, and I can put the new ID in the same git repo. If my crev key is compromised and someone uses it to create spam… I still control my git repo, so I can just make sure they don’t have access to it as well as creating a new key. If someone gets control of my git repo, they don’t have my crev key, so they can’t add or alter proofs or trust new people, only remove things. Bad, but not fatal, and I’m going to have a local backup somewhere since it’s part of crev anyway. They could delete ALL your proofs and replace them with their own, but since other people trust your key and not your repo, that gives them the same problem. So, it seems pretty hard for an attacker to fuck up your life by compromising your crev setup; the crev key and git repo combine to make two-factor auth an architectural built-in, so the attacker has to get control of both to really do damage.

I’ve noticed a few ergonomic things to improve though, odd edge-case bugs aside. Most of what I do with crev doesn’t actually read or affect Rust code at all; all the ID management and such could be a separate tool, not a cargo plugin. It would be nice to have a general-purpose tool that only manages ID’s and proofs, and a special-purpose tool that’s specifically for Rust-related stuff. That would make it easier to make crev-based tools for other languages. I also don’t like that you have to be in the directory of a project using a particular crate before you can do cargo crev goto dependency_name, though it does need the data in the project dir to find the right version of the dependency to go to. The cargo-crev tool itself is really trying to do three separate things: manage ID’s and proofs, create new proofs, and assess a Rust package based on the proofs I know. No wonder it feels a bit clutter-y.

Well, there’s space for the tool to evolve, and it looks like it’s going in that direction anyway. The tool can change however it wants as long as the proof file format is the same, and the proof file format has a version number so it can change too.

All in all though… this is a heckin great system. And reading code is usually a lot easier than writing it, especially Rust, and I tend to end up reading a lot of bits and pieces of people’s code anyway. So, having a way to formalize the process and share my findings is real useful. (After all, who doesn’t like to brag about their opinions?) The tooling definitely is rough, but generally usable, and I am in love with the power of the concept.

So what would we have to do to actually make crev useful in the future? There’s a lot of small steps:

  • First, we need an option for cargo update or such that will make it attempt to ONLY use packages that have passing code reviews, or at least prefer the ones that do and warn if it’s forced to use one that doesn’t.
  • Second, obviously, we need more people to use crev. This is the hard bit, which is why I’m writing this. Frankly if everyone just checks cargo crev verify semi-consistently, that would be a real good start.
  • Next, if we get people into the habit of publishing code reviews for the crates that they create, that would be awesome. First, the presence of a review by an author can serve as an indicator that the crate is intended to be taken seriously instead of just being some random experiment, and second, it would mark a particular crypto key as “this is the author of this crate” which is a useful connection to have anyway. This would make it easier to automatically catch cases like the rubygems rest-client attack that made me kick off this whole investigation.
  • We need to treat a library being un-reviewed as a bug, and file issues for crates that aren’t reviewed at least by their author. This will help with the previous points, again just by making people aware of the problem; responsible authors will want their code to be reviewed, and you don’t want to use the code of unresponsible authors.
  • We need crev tools for non-Rust codebases. The infrastructure in terms of proofs and such is all language-agnostic, we just need to start pushing it into other communities. Everyone will benefit.
  • Frankly, if people want to make their own proofs, just doing one crev code review a week will add up fast. There’s a whole lot of small but key crates out there.

So yeah, you should use crev! My public ID is lr2ldir9XdBsKQkW3YGpRIO2pxhtSucdzf3M5ivfv4A and proofs repo is https://git.sr.ht/~icefox/crev-proofs. ’Cause, you totally trust my opinion, right?

Cheatsheet

Setup:

# Create an ID
cargo crev new id --url https://git.sr.ht/<whoever>/crev-proofs
# Trust someone you trust
cargo crev trust <some key>

# Or start with someone's git repo...
cargo crev fetch url https://github.com/dpc/crev-proofs
# ...find their ID...
cargo crev query id all
# and then mark it trusted
cargo crev trust <whatever>

Basic workflow:

cd your_project
# Update your trusted repos
cargo crev fetch trusted
cargo crev verify
# Find a dependency that looks easy, important, or both
cargo crev goto some_dependency
# Look at stuff, decide what you think
cargo crev review
# write your review, save, quit
exit
# Push your changes
cargo crev publish
# And, you're done!