Skip to content

The Rectangle Rule

Eddie Aftandilian edited this page Apr 19, 2019 · 1 revision

The Rectangle Rule

Introduction

Imagine we’re writing a mechanical formatter for Java, like google-java-format. We’d like the formatted output to follow a few strict rules (e.g., no lines longer than some limit, with specified exceptions) but otherwise we’d just like it to be as “readable” as possible. A poorly formatted statement, like:

    currentEstimate = (currentEstimate + x
        / currentEstimate) / 2.0f;

(imagining it won’t all fit on one line) might not be as "readable" as, say, the better formatted:

    currentEstimate =
        (currentEstimate + x / currentEstimate)
            / 2.0f;

Formatting for this sort of abstract readability is not easily mechanized; it’s loose, not strict. Without more precise "big rules" for readability, we can end up with lots of "little rules"—don’t ever break a line here; don’t break there unless you absolutely have to—and all these little rules can easily interact in complex and confusing ways.

Alternatively, if we can find a few big rules that produce readability, or that at very least promote readability, we can simplify writing a formatter, by using the big rules to reduce the number of special little rules that we must add.

The Rectangle Rule

Here’s the Rectangle Rule, one such big rule for promoting readability:

When a source file is formatted, each subtree gets its own bounding rectangle, containing all of that subtree’s text and none of any other subtree’s.

What does this mean? Take the well formatted example above, and draw a rectangle around just the subexpression x / currentEstimate:

    currentEstimate =
        (currentEstimate + x / currentEstimate)
            / 2.0f;

This is possible—good! But in the badly formatted example, there is no rectangle containing just that subexpression and nothing more—bad!

    currentEstimate = (currentEstimate + x
        / currentEstimate) / 2.0f;

In the well formatted example, every subtree has its own rectangle; for instance, the right-hand side ("RHS") of the assignment has its own rectangle in the well formatted example, but not in the other. This promotes readability by exposing program structure in the physical layout; the RHS is in just one place, not partly in one place and partly another.

There are some complexities and exceptions. We must ignore some “random” punctuation like ; . Perhaps we should just reduce overlap, not eliminate it altogether. Even so, the Rectangle Rule is a simple big rule for promoting readability by exposing code structure. It forces a number of formatting choices, simplifying the formatter’s job in deciding the remainder.

Method invocations

Here’s a more complicated sequence of statements:

    PCollection<List<Integer>> recomputedMean =
        p.apply(Create.of(Arrays.asList(assigned)).withCoder(KvCoder.of(
        ListCoder.of(BigEndianIntegerCoder.of()), ListCoder.of(BigEndianIntegerCoder.of()))))
        .apply(
            Combine.<List<Integer>, List<Integer>, List<Integer>>perKey(
                new RecomputeMeanCombineFn()))
            .apply(Values.<List<Integer>>create());
    DirectPipelineRunner.EvaluationResults results = p.run();
    Assert.assertThat(results.getPCollection(recomputedMean),
        containsInAnyOrder(Arrays.asList(20, 2), Arrays.asList(15, 55)));

Here are the same statements, reformatted to follow the Rectangle Rule:

    PCollection<List<Integer>> recomputedMean =
        p.apply(
                Create.of(Arrays.asList(assigned))
                    .withCoder(
                        KvCoder.of(
                            ListCoder.of(BigEndianIntegerCoder.of()),
                            ListCoder.of(BigEndianIntegerCoder.of()))))
            .apply(
                Combine.<List<Integer>, List<Integer>, List<Integer>>perKey(
                    new RecomputeMeanCombineFn()))
            .apply(Values.<List<Integer>>create());
    DirectPipelineRunner.EvaluationResults results = p.run();
    Assert.assertThat(
        results.getPCollection(recomputedMean),
        containsInAnyOrder(Arrays.asList(20, 2), Arrays.asList(15, 55)));

Each method call has its own rectangle. (As before, we ignore random punctuation like ) and {.) The two arguments to Assert.assertThat even have a rectangle together, as well as separately, further exposing program structure.

This is not the only way to format according to the Rectangle Rule, but it is the layout that google-java-format produces. Here, the reformatted statements are radically unfolded from the original.

Here’s another bit of code:

    mCropView.postDelayed(new Runnable() {
        @Override
        public void run() {
            if(!visible) {
                changeWallpaperFlags(visible);
            } else {
                mCropView.setVisibility(View.INVISIBLE);
            }
        }
    }, FLAG_POST_DELAY_MILLIS);

Here’s the same code, reformatted to follow the Rectangle Rule:

    mCropView.postDelayed(
        new Runnable() {
          @Override
          public void run() {
            if (!visible) {
              changeWallpaperFlags(visible);
            } else {
              mCropView.setVisibility(View.INVISIBLE);
            }
          }
        },
        FLAG_POST_DELAY_MILLIS);

The anonymous inner class is in one place and nothing overlaps it visually, again helping to expose the program structure.

Declarations

So far we’ve shown the rectangle rule at work with expressions, where the idea of subtree is straightforward. It can also be useful in more complex situations, such as declarations. Consider this code:

  public static <W extends BoundedWindow> StateTag<Object, WatermarkHoldState<W>>
      watermarkStateInternal(String id, OutputTimeFn<? super W> outputTimeFn) {
    return new WatermarkStateTagInternal<W>(new StructuredId(id), outputTimeFn);
  }

Perhaps we find it distracting that the method’s return type (StateTag<Object, WatermarkHoldState<W>>) and name (watermarkStateInternal) are so far apart. If we arbitrarily define them together to be treated as a subtree of the declaration, we force a different formatting, perhaps google-java-format's:

  public static <W extends BoundedWindow>
      StateTag<Object, WatermarkHoldState<W>> watermarkStateInternal(
          String id, OutputTimeFn<? super W> outputTimeFn) {
    return new WatermarkStateTagInternal<W>(new StructuredId(id), outputTimeFn);
  }

Exceptions

The real world is complicated and rules sometimes need exceptions. (The highest-level rule is probably Don’t needlessly confuse or annoy users.)

Right Parens

The Rectangle Rule is not strictly applied to right parens ). Trailing right parens are always rendered with no preceding whitespace and this may cause the right edge of inner bounding rectangles to be poorly defined:

    outerMethod(
        methodWithExcessivelyLongName(
            deeplyNestedArgument));

In the above example, there is no proper rectangle which exactly encloses the argument of outerMethod(...). The right edge cannot both include Name( and exclude );. This departure from a pure interpretation of the Rectangle Rule is similar to the treament of semicolons and follows the more typical convention of never breaking before ).

Return statements

Consider the expression in this statement:

    return annotationStrategy.equals(other.annotationStrategy)
        && typeLiteral.equals(other.typeLiteral);

Oops! To make it strictly follow the Rectangle Rule, we’d have to reformat it to break before the expression being returned:

    return
        annotationStrategy.equals(other.annotationStrategy)
            && typeLiteral.equals(other.typeLiteral);

We've seen almost no existing Java code that breaks after the return, suggesting we make an exception here. We can rationalize it (and others like it) by saying that not much of the enclosing subtree overlaps. If we change the formatter’s indentation rules to follow the Rectangle Rule more closely, we risk surprising or annoying a lot of people.

Left Associative Operators

Consider this statement:

    int fifteen =
        0 + 1 + 2 + 3
            + 4 + 5;

Since addition in Java is left-associative, 0 + 1 + 2 + 3 + 4 is a subtree, and yet it doesn’t have its own rectangle here. We must redefine the shape of the tree to avoid surprising users with unexpected layouts like:

    int fifteen =
        0 + 1 + 2 + 3
                + 4
            + 5;

google-java-format implements a number of exceptions to the Rectangle Rule, but it seems certain that even more might be worthwhile. For example, it currently generates the somewhat annoying formatting:

    method1(
        method2(
            method3(
                method4(
                    method5(
                        "Long, long expression"
                            + "that won't fit on one line.")))));

which might (or might not) be improved by violating the Rectangle Rule:

    method1(method2(method3(method4(method5(
        "Long, long expression"
            + "that won't fit on one line.")))));

Creating new exceptions, and doing so precisely, is an ongoing challenge.

Finally, google-java-format is limited because it is not a compiler. It can make formatting choices based only on syntax (and initial layout), not on semantics. For example, it might make sense to lay out fluent chains of methods calls differently from other chains, but google-java-format cannot (for example) look at the methods’ type signatures to determine which are which.

So what does the Rectangle Rule buy us?

The Rectangle Rule is a big rule that helps to promote readability. Many other possible rules promote readability too, the Rectangle Rule is simple and broad in its implications.

Because the Rectangle Rule limits how code can be folded together, it forces more white space into the formatted output, increasing the number of lines required for some code.

The Rectangle Rule is compatible with existing Java Style Guide rules, such as the indentation rules. It is largely compatible with existing practice, although there are exceptions like return statements, and although much existing code is heavily folded to reduce the number of lines required.

While the Rectangle Rule is shown here in use with Java, experience shows that it is also usable with other programming languages.

Readings

There is a rich history of rules and algorithms for the formatting of programs or other structured text, also called “pretty-printing” or "grinding" (Goldstein, Moon). google-java-format implements a variant of the linear-time Oppen algorithm invented by Derek Oppen, Greg Nelson, and Eric Roberts at Harvard University in the 1970s; this algorithm has inspired a fascinating series of interesting variants (Wadler, Swierstra & Chitil). The Oppen algorithm makes it easy to implement the Rectangle Rule (and its exceptions), although it does not mandate it.