My Least Favorite Rust Type
September 20th, 2020

My least favorite Rust type is std::ops::Range.

Range is a foundational type, with magic syntax but a simple definition:

struct Range<Idx> {
    /// The lower bound of the range (inclusive).
    pub start: Idx,
    /// The upper bound of the range (exclusive).
    pub end: Idx,
}

Idx is typically an integral type, but you can make a Range of anything, which will become important later. Here’s a range of Unit:

()..()

(In case you were wondering, the Unit range does not contain Unit.)

What’s wrong with Range? This will be a critique on Rust’s own terms.

Range needs random-feeling borrows

Any Rust tutorial will cover borrowing, lifetimes, ownership, etc. You think you understand, then this:

(3..5).contains(&4) // why the &

You need the borrow because you can’t like, OWN a number, man!

Heh, no, it’s because, what if you wanted to make a Range<Vec> instead:

let r = vec![1, 2, 3]..=vec![10, 11, 12];
let v = r.contains(&vec![9, 9, 9, 9]);

(try to guess what value v has above)

Range<Vec> requires the borrow, so the vastly more common Range<usize> etc. forces it as well.

Range needs random-feeling clones

We’re not done picking on the borrow checker. Range makes you clone even when there’s no ownership transfer:

let v = vec![1, 2, 3, 4];
let r = 1..2;
&v[r.clone()]; // required to avoid "use of moved value" in next line
&v[r]

Range could be Copy, but some Ranges are also iterators and you might copy one by mistake:

let mut iter = 0..n;
for i in iter { if i > 2 { break; } } // silent copy of iter
iter.collect()

so Ranges which are not used as iterators (many cannot be, like Range<f64> or Range<char>) pay the price, and so does any type which embeds a Range.

This is abusing the borrow checker as a bad linter. Range undermines the story of lifetimes and ownership, making the borrow checker feel arbitrary.

Range is unsure when it is valid

Rust will happily construct a backwards Range, which ends before it starts:

let x = 5..0;
x.contains(&4); // false

Is this backwards range valid? Its len() is 0, it contains() nothing, it yields nothing as an iterator. But if you try to use it to index into a slice, you get a panic! So it looks valid but is primed to explode!

This is because - again - you can make a Range of anything. If you try to enforce that start <= end, you lose the ability to make a Range of non-comparable things. A Range of dyn Error can’t be invalid, so a Range of int never gets to be.

A practical problem is writing correct bounds checks. For example, consider the get_unchecked function on slice - it says “an out-of-bounds index is undefined behavior” but never defines what out of bounds means. So how does one even call this function safely?

Restated: a Range with start 0 and length 0 may be out of bounds. That’s wild.

Range hides a footgun

A regex engine (here’s one) often deals in ranges of char, for example /[a-z]/. Should it use Range<char>?

No! The footgun is /[\u10FFFF]/, which is the largest char. Range<char> cannot represent this value, even though it has the bits to do so.

Is RangeInclusive a better choice? This uses an additional bool field to mean…stuff, and it needs to be separately checked in several places. This is a silly expensive representation, pushing RangeInclusive<char> to 12 bytes even though it would fit in 8, with bits to spare. Not a good choice for a perf-sensitive regex algorithm.

A Recipe for Rearranging Range

The problem is Range is overloaded: it’s too flexible, it wants to be everything.

  • It’s sometimes a set
  • It’s sometimes an iterator
  • It’s sometimes a slice index
  • You can make silly Ranges out of anything

These goals are in tension, and it meets none of them well.

Perhaps it’s too late and we must live with this wart, but a recipe for a different approach:

  1. Limit Range to Copy + PartialOrd types. There may be occasional uses for Range<String> or Range<BigNum>, but it’s not worth forcing borrowing for the common case.

  2. Now that Range always knows how to compare its ends, enforce that start <= end at construction time. For Ranges constructed from constants, this will be free. This avoids the “looks valid, actually explodes” problem and will unlock further optimizations:

    • Range::is_empty() could be written naturally instead of a a bizarre way just for NaNs
    • [T]::get() would need to branch once, not twice
    • Range::len() could just be a sub and not require a branch
  3. Give Range an iter() method, like other collections have. Now Range is not an Iterator, just like Vec is not an Iterator, and Range can be easily made Copy. This does mean writing for n in (1..10).iter(), but Rust already requires that for collections, so it’s more consistent.

  4. Now that Range is not an Iterator, RangeInclusive can drop its extra bool. It would simply be a pair of indexes, and could not be empty (that’s what Swift does).

Thanks for reading!