The Surprising Limitations of C++ Ranges Beyond Trivial Cases

Published September 13, 2019

Today we have a guest post from Alex Astashyn. Alex is a tech lead for the RefSeq resource at the National Center for Biotechnology Information.

Note: The opinions expressed in this article are those of the author. Also, I can’t count myself as a “range expert”, so some of the information pertaining to ranges might be factually incorrect (leave a comment if you spot anything egregiously wrong).

In this article I discuss the problems and limitations I’ve encountered with c++ ranges.

I also introduce my own library, rangeless that distills all functionality I expected to have been fulfilled by ranges. It allowed me to tackle much more expanded scope of interesting applicable real-life use-cases.

Prologue

Like any fan of functional-oriented declarative stateless programming, I thought ranges looked very promising. However, trying to use them in practice proved to be a very frustrating experience.

I kept trying to write what seemed to me like perfectly reasonable code, yet the compiler kept barfing pages of error messages I could not make sense of. Eventually I realized the error of my ways. I thought of ranges like of UNIX pipelines cat file | grep ... | sed ... | sort | uniq -c | sort -nr | head -n10 , but that is not so…

Examples

Example 1: Intersperse

Let’s try writing a view that intersperses a delimiter between input elements.

(This functionality is provided by range-v3, so we can compare and contrast the approaches)

        // inputs:    [x1, x2, ... xn] 
        // transform: [[x1, d], [x2, d], ... [xn, d]]
        // flatten:   [ x1, d, x2, d, ... xn, d ]
        // drop last: [ x1, d, x2, d, ... xn ]
        auto intersperse_view = 
        view::transform([delim](auto inp)
        {
            return std::array<decltype(inp), 2>{{ std::move(inp), delim }};
        })
      | view::join // also called concat or flatten in functional languages
      | view::drop_last(1); // drop trailing delim

The transform | join composition above is a common operation on streams that transforms each input into a sequence of outputs, and flattens the resulting sequence-of-sequences.

[x] -> (x -> [y]) -> [y]

Some languages have a separate abstraction for this, e.g. flat_map in Elixir or SelectMany in LINQ.

Adhering to Principle of Least Astonishment, it seems like the above ought to work. (if you have not seen this talk, I cannot recommend it enough).

However, this will not compile with range-v3. What gives? It turns out the problem is that view::join doesn’t like the fact that the subrange (returned collection) is a container returned as rvalue. I came up with the following hack: views (sometimes) compose with rvalues of views, so let’s wrap the container return-value as a view!

       view::transform([delim](auto inp)
        {
            return view::generate_n([delim, inp, i = 0]() mutable
            {
                return (i++ == 0) ? inp : delim;
            }, 2);
        })

Or, generalizing, if we want to return a container, e.g. a vector, as a view in some other use-case:

        view::transform([](int x)
        {
            auto vec = ... ;
            return view::generate_n([i = 0, vec = std::move(vec)]() mutable
            {
                return std::move(vec[i++]);
            }, vec.size());
        })
      | view::join // now join composes with transform

Isn’t this clever? Maybe, but having to come up with clever hacks to be able to do something as basic as this is not a good sign.

It turns out I was not the first person to hit this problem. The library implementers presented their own workarounds. As noted by Eric Niebler here, my solution is “illegal” because by capturing the vector in the view no longer satisfies O(1) copy complexity requirement.

That said, if we peek under the hood of view::generate or view::generate_n we’ll see that they cache the last generated value, so having view::generate yield a std::string, or std::vector, or a type containing those, you’re not satisfying the library requirements already.

Are we done with the example? Almost.

We have:

       ...
      | view::join
      | view::drop_last(1);

You’d think that drop_last would internally keep a queue of n elements in a circular buffer and would simply discard it upon reaching last input. range-v3 views, however may not buffer elements, so view::drop_last has to impose SizedRange or ForwardRange requirement on the input, whereas view::join returns an InputRange (even if it receives a ForwardRange as input). This kills not only the composition, or any hope of lazy evaluation (you have to eagerly dump your entire InputRange (hopefully finite) to a std::vector first to convert it to a ForwardRange).

So how would we implement this? We’ll get to it later…

Example 2:

Below is an example implemented with rangeless library (a slightly modified version of Knuth-vs-McIlroy challenge to make it a little more interesting).

    namespace fn = rangeless::fn;
    using fn::operators::operator%;
    //
    // Top-5 most frequent words from stream chosen among the words of the same length.
    //
    auto my_isalnum = [](const int ch)
    {
        return std::isalnum(ch) || ch == '_';
    };
    fn::from( // (1)
        std::istreambuf_iterator<char>(std::cin.rdbuf()),
        std::istreambuf_iterator<char>{ /* end */ })
      % fn::transform([](const char ch) // (2)
        {
            return std::tolower(uint8_t(ch));
        })
      % fn::group_adjacent_by(my_isalnum) // (3)
        // (4) build word->count map
      % fn::foldl_d([&](std::map<std::string, size_t> out, const std::string& w)
        {
            if(my_isalnum(w.front())) {
                ++out[ w ];
            }
            return out; // NB: no copies of the map are made
                                   // because it is passed back by move.
        })
      % fn::group_all_by([](const auto& kv) // (5) kv is (word, count)
        {
            return kv.first.size(); // by word-size
        })
      % fn::transform( // (6)
            fn::take_top_n_by(5UL, fn::by::second{})) // by count
      % fn::concat() // (7) Note: concat is called _join_ in range-v3
      % fn::for_each([](const auto& kv)
        {
            std::cerr << kv.first << "\t" << kv.second << "\n";
        })
      ;

As you can see, the code is very similar to ranges in style, but the way it works under the hood is entirely different (will be discussed later).

Trying to rewrite this with range-v3 we would encounter the following problems:

(3) This will not work because view::group_by requires a ForwardRange or stronger.
(4) How does one do a composeable left-fold (one of the three pillars of filter/map/reduce idiom) with ranges? ranges::accumulate is a possible candidate, but it is not “pipeable” and does not respect move-semantics (numerics-oriented).
(5) foldl_d returns a std::map, which satisfies ForwardRange, but it will not compose with the downstream group-by because it’s an rvalue. There’s no group_all_by in ranges, so we’d have to dump the intermediate result into an lvalue first to apply a sort-action.
(6,7) transform, concat: This is the same problem that we have already seen with “intersperse” example, where range-v3 can’t flatten a sequence of rvalue-containers.

Example 3: Transform-in-parallel

The function below is taken from aln_filter.cpp example. (which, by the way, showcases the usefulness of lazy data-stream manipulation in applicable use-cases).

The purpose of lazy_transform_in_parallel is to do the same job as plain transform, except each invocation of the transform-function is executed in parallel with up-to specified number of simultaneous async-tasks. (Unlike with c++17’s parallelized std::transform we want this to work lazily with an InputRange.)

static auto lazy_transform_in_parallel = [](auto fn,
                                           size_t max_queue_size = std::thread::hardware_concurrency())
{
    namespace fn = rangeless::fn;
    using fn::operators::operator%;
    assert(max_queue_size >= 1);
    return [max_queue_size, fn](auto inputs) // inputs can be an lazy InputRange
    {
        return std::move(inputs)
        //-------------------------------------------------------------------
        // Lazily yield std::async invocations of fn.
      % fn::transform([fn](auto inp)
        {
            return std::async(std::launch::async,
                [inp = std::move(inp), fn]() mutable // mutable because inp will be moved-from
                {
                    return fn(std::move(inp));
                });
        })
        //-------------------------------------------------------------------
        // Cap the incoming sequence of tasks with a seq of _max_queue_size_-1
        // dummy future<...>'s, such that all real tasks make it
        // from the other end of the sliding-window in the next stage.
      % fn::append(fn::seq([i = 1UL, max_queue_size]() mutable
        {
            using fn_out_t = decltype(fn(std::move(*inputs.begin())));
            return i++ < max_queue_size ? std::future<fn_out_t>() : fn::end_seq();
        }))
        //-------------------------------------------------------------------
        // Buffer executing async-tasks in a fixed-sized sliding window;
        // yield the result from the oldest (front) std::future.
      % fn::sliding_window(max_queue_size)
      % fn::transform([](auto view) // sliding_window yields a view into its queue
        {
            return view.begin()->get();
        });
    };
};

One would think that this has all the pieces to be implementable with ranges, but that is not the case. The obvious problem is that view::sliding requires a ForwardRange. Even if we decided to implement an “illegal” buffering version of sliding, there are more problems that are not visible in the code, but will manifest at runtime:

In range-v3 the correct usage of view::transform is contingent on the following assumptions:

It is cheap to recompute (This doesn’t work for the first transform in the above example that takes and passes the input by-move and launches an async-task).
It’s OK to invoke it multiple times on the same input (This doesn’t work for the second transform, where the call to std::future::get leaves it in invalid state, and so can only be called once).

If the transform-function is something like “add one” or “square an int” these assumptions are probably fine, but if the transform-function needs to query a database or spawn a process to run a heavy task, such assumptions are a little presumptuous.

This problem is what Jonathan described in the Terrible Problem Of Incrementing A Smart Iterator.

This behavior is not a bug, and is, apparently, by design – yet another reason why we can’t have nice things with range-v3.

In rangeless, fn::transform neither calls the transform-function on the same input more than once, nor does it cache the result.

Note: transform_in_parallel is provided in the rangeless library. Compare implementation of a parallelized gzip compressor with rangeless (Ctrl+F pigz) vs. RaftLib.

What’s the conclusion from all this?

Complexity of ranges.

We need a reasonably coherent language that can be used by “ordinary programmers” whose main concern is to ship great applications on time.
[Bjarne Stroustrup]

Ranges simplify the code for basic use cases, for example, you can write action::sort(vec) instead of std::sort(vec.begin(), vec.end()). However, beyond the most basic usages the complexity of the code increases exponentially.

For example, how would one implement the above-mentioned intersperse-adapter?

Let’s look at the Haskell example first, just to have a reference point of what “simple” ought to look like.

intersperse ::  a -> [ a ] -> [ a ]
intersperse     _ [ ] = [   ]
intersperse     _ [ x ] = [ x ]
intersperse delim    (x:xs) = x : delim : intersperse delim xs

Even if you’ve never seen any Haskell in your life, you can probably figure out how that works.

Below are three different ways of doing it with rangeless. Just like the Haskell’s signature my_intersperse takes a delim and returns a unary callable that can take some Iterable and return a sequence yielding the elements, interspersing delim.

A) As a generator-function:

auto my_intersperse = [](auto delim)
{
    return [delim = std::move(delim)](auto inputs)
    {
        return fn::seq([  delim,
                         inputs = std::move(inputs),
                             it = inputs.end(),
                        started = false,
                           flag = false]() mutable
        {
            if(!started) {
                started = true;
                it = inputs.begin();
            }
            return it == inputs.end() ? fn::end_seq()
                 :     (flag = !flag) ? std::move(*it++)
                 :                      delim;
        });
    };
};

B) By using fn::adapt, a facility in rangeless for implementing custom adapters

auto my_intersperse = [](auto delim)
{
    return fn::adapt([delim, flag = false](auto gen) mutable
    {
        return           !gen ? fn::end_seq()
             : (flag = !flag) ? gen()
             :                  delim;
    });
};

C) As composition of existing functions (what we attempted and failed to implement with range-views)

auto my_intersperse = [](auto delim)
{
    return [delim = std::move(delim)](auto inputs)
    {
        return std::move(inputs)
      % fn::transform([delim](auto inp)
        {
            return std::array<decltype(inp), 2>{{ std::move(inp), delim }};
        })
      % fn::concat()
      % fn::drop_last(); // drop trailing delim
    };
};

D) We can also implement intersperse as a coroutine, without any help from rangeless::fn.

template<typename Xs, typename Delim>
static unique_generator<Delim> intersperse_gen(Xs xs, Delim delim)
{
    bool started = false;
    for (auto&& x : xs) {
        if(!started) {
            started = true;
        } else {
            co_yield delim;
        }
        co_yield std::move(x);
    }
};

auto my_intersperse = [](auto delim)
{
    return [delim](auto inps)
    {
        return intersperse_gen(std::move(inps), delim);
    };
};

All of the implementations are about the same in terms of code complexity. Now let’s look at what the range-v3 implementation looks like: intersperse.hpp. To me, personally, this looks hypercomplex. If you’re not sufficiently impressed, consider an implementation of a cartesian-product as a coroutine:

template<typename Xs, typename Ys>
auto cartesian_product_gen(Xs xs, Ys ys) 
  -> unique_generator<std::pair<typename Xs::value_type,
                                typename Ys::value_type>>
{
    for(const auto& x : xs)
        for(const auto& y : ys)
            co_yield std::make_pair(x, y);
}

Compare the above with range-v3 implementation.

Writing views with range-v3 is supposed to be easy, but, as the examples show, the bar of what’s considered “easy” in post-modern c++ has been raised to the heights not reachable by mere mortals.

The situation in the application code involving ranges is not any simpler.

Compare Haskell vs. Rust vs. rangeless vs. range-v3 implementations of a calendar-formatting app. Don’t know about you, but the last implementation does not inspire me to ever have to understand or write code like this.

Note that in the range-v3 example the authors break their own view copy-complexity requirements in interleave_view by having a std::vector field.

Range views leak abstraction

One of the big promises of ranges is abstracting-away the iterators. In our rangeless + coroutine implementations above we’ve successfully managed not having to deal with iterators directly in all cases except for (A) – manually capturing the input-range in the closure and then yielding its elements with std::move(*it++)

If you go back to the range-v3 intersperse and calendar-app above and study in it more detail, you’ll see that in implementation of views we end up dealing with iterators directly, quite a lot in fact. Ranges don’t save you from dealing with iterators directly beyond calling sort on a range or some such. On the contrary, it’s “dealing with iterators, with extra steps”.

Compile-time overhead

The range-v3 library is infamous for its compile times. “On my machine” the compilation time for the above calendar example is over 20s, whereas the corresponding rangeless implementation compiles in 2.4s, 1.8s of which is just the #include <gregorian.hpp> – nearly an order of magnitude difference!

Compilation times are already an issue in every-day c++ development, and ranges don’t just make it slightly worse! In my case this fact alone precludes any possibility of using ranges in production code.

The `rangeless` library

With rangeless I did not try to reinvent the wheel, and followed the design of streaming libraries in functional languages (Haskell’s Data.List, Elixir’s Stream, F#’s Seq, and LINQ).

Unlike in range-v3, there are no ranges, views, or actions – just passing of values from one function to the next through a chain of unary invokables, where a value is either a container or a sequence (input-range, bounded or unbounded).

There’s a little bit of syntactic sugar:

operator % (Arg arg, Fn fn) -> decltype(fn(std::forward<Arg>(arg)))
auto x1 = std::move(arg) % f % g % h; // same as auto x1 = h(g(f(std::move(arg))));

This is the equivalent of infix operator & in Haskell or operator |> in F#. This allows us to structure the code in a fashion congruent with the direction of the dataflow. It doesn’t matter for a single-liner, but helps when the functions are multiline lambdas defined in-place.

Why operator% specifically, rather than >> or |, you wonder? The shopping-list of overloadable binary operators is not very long in C++, and the former tends to be heavily overloaded due to streams, and the pipe operator as well, usually for “smart”-flags, or “chaining” a.k.a point-free composition, as in ranges. I considered overloadable operator->* , but ultimately settled with operator% because given the context it’s unlikely to be confused with integer-modulo, and also has %= counterpart that is useful to apply a state-change to LHS, e.g

vec %= fn::where(.../*satisfies-condition-lambda*/);

An input is either seq or a Container, and so is the output. E.g. fn::sort needs all elements to do its job, so it will dump entire input seq into a std::vector, sort it, and return as std::vector. A fn::transform, on the other hand, will wrap the input, taken by value, as seq that will lazily yield transformed input-elements. Conceptually this is similar to UNIX pipelines, with eager sort and lazy sed.

Unlike in range-v3, input-ranges (sequences) are first-class citizens. The issues of the concept-mismatches between arguments and parameters that we’ve seen in range-v3 are non-existent (e.g. expecting ForwardRange, but received InputRange). Everything is composeable, as long as the value-types are compatible.

Epilog

I tried to use ranges to write expressive code. Am I the only one who ended up constantly “holding it wrong”?

I was quite surprised to learn that the committee accepted ranges into the c++20 standard and most c++ experts are excited about it. It’s as if the issues of limited usability, code complexity, leaky abstractions and completely unreasonable compile-times are of no consequence whatsoever to the committee members?

I feel like there’s a disconnect between the c++ experts that spearhead the development of the language and the common programmers that want simpler ways of doing complex things. It seems to me that Bjarne Stroustrup’s plea from Remember the Vasa! fell on deaf ears (again, my subjective opinion).