About half a year ago I encountered a paper bombastically
titled “the ultimate conditional syntax”. It has the
attractive goal of unifying pattern match
with boolean if
tests,
and its solution is in some ways very nice. But it seems
over-complicated to me, especially for something that’s a basic
work-horse of programming.
I couldn’t immediately see how to cut it down to manageable proportions, but after reading syntactic musings on match expressions by Yoshua Wuyts a couple of weeks ago, I had an idea. I’ll outline it under the “penultimate conditionals” heading below, after reviewing the UCS and explaining my motivation.
- what the UCS?
- whence UCS
- out of scope
- penultimate conditionals
- dangling syntax
- examples
- antepenultimate breath
what the UCS?
The ultimate conditional syntax does several things which are somewhat intertwined and support each other.
-
An “expression
is
pattern” operator allows you to do pattern matching inside boolean expressions.Like “
match
” but unlike most other expressions, “is
” binds variables whose scope is the rest of the boolean expression that might be evaluated when the “is
” is true, and the consequent “then
” clause. -
You can “split” tests to avoid repeating parts that are the same in successive branches. For example,
if num < 0 then -1 else if num > 0 then +1 else 0
can be written
if num < 0 then -1 > 0 then +1 else 0
The example shows a split before an operator, where the left hand operand is the same and the rest of the expression varies. You can split after the operator when the operator is the same, which is common for “
is
” pattern match clauses. -
Indentation-based syntax (an offside rule) reduces the amount of punctuation that splits would otherwise need. An explicit version of the example above is
if { x { { < { 0 then −1 } }; { > { 0 then +1 } }; else 0 } }
(This example is written in the paper on one line. I’ve split it for narrow screens, which exposes what I think is a mistake in the nesting.)
-
You can also intersperse
let
bindings between splits. I doubt the value of this feature, since “is
” can also bind values, but interspersedlet
does have its uses. The paper has an example usinglet
to avoid rightward drift:if let tp1_n = normalize(tp1) tp1_n is Bot then Bot let tp2_n = normalize(tp2) tp2_n is Bot then Bot let m = merge(tp1_n, tp2_n) m is Some(tp) then tp m is None then glb(tp1_n, tp2_n)
It’s probably better to use early return to avoid rightward drift.
The desugaring uses
let
bindings when lowering the UCS to simpler constructions.
whence UCS
Pattern matching in the tradition of functional programming languages
supports nested patterns that are compiled in a way that eliminates
redundant tests. For example, this example checks that e1
is
Some(_)
once, not twice as written.
if e1 is Some(Left(lv)) then e2
Some(Right(rv)) then e3
None then e4
Being cheeky, I’d say UCS introduces more causes of redundant checks, then goes to great effort to to eliminate redundant checks again.
Splits reduce redundant code at the source level; the bulk of the paper is about eliminating redundant checks in the lowering from source to core language.
I think the primary cause of this extra complexity is treating the
is
operator as a two-way test rather than a multi-way match. Splits
are introduced as a more general (more complicated) way to build
multi-way conditions out of two-way tests.
There’s a secondary cause: the tradition of expression-oriented
functional languages doesn’t like early returns. A nice pattern in
imperative code is to write a function as a series of preliminary
calculations and guards with early returns that set things up for the
main work of the function. Rust’s ?
operator and let
-else
statement support this pattern directly. UCS addresses the same
pattern by wedging calculate-check sequences into if
statements, as
in the normalize
example above.
out of scope
I suspect UCS’s indentation-based syntax will make programmers more likely to make mistakes, and make compilers have more trouble producing nice error messages. (YAML has put me off syntax that doesn’t have enough redundancy to support good error recovery.)
So I wondered if there’s a way to have something like an “is
pattern” operator in a Rust-like language, without an offside rule,
and without the excess of punctuation in the UCS desugaring.
But I couldn’t work out how to make the scope of variable bindings in
patterns cover all the code that might need to use them. The scope
needs to extend into the consequent then
clause, but also into any
follow-up tests – and those tests can branch so the scope might need
to reach into multiple then
clauses.
The problem was the way I was still thinking of the then
and else
clauses as part of the outer if
. That implied the expression has to
be closed off before the then
, which troublesomely closes off the
scope of any is
-bound variables. The solution – part of it, at
least – is actually in the paper, where then
and else
are nested
inside the conditional expression.
penultimate conditionals
There are two ingredients:
-
The
then
andelse
clauses become operators that cause early return from a conditional expression.They can be lowered to a vaguely Rust syntax with the following desugaring rules. The
'if
label denotes the closest-enclosingif
; you can’t usethen
orelse
inside theexpr
of athen
orelse
unless there’s another interveningif
.
then
expr
⟼&& break 'if
expr
else
expr
⟼|| break 'if
expr
else
expr
⟼|| _ && break 'if
expr
There are two desugarings for
else
depending on whether it appears in an expression or a pattern.If you prefer a less wordy syntax, you might spell
then
as=>
(likematch
in Rust) andelse
as|| =>
. (For symmetry we might allow&& =>
forthen
as well.) -
An
is
operator for multi-way pattern-matching that binds variables whose scope covers the consequent part of the expression.The basic form is like the UCS,
scrutinee is pattern
which matches the
scrutinee
against thepattern
returning a boolean result. For example,foo is None
Guarded patterns are like,
scrutinee is pattern && consequent
where the scope of the variables bound by the
pattern
covers theconsequent
. Theconsequent
might be a simple boolean guard, for example,foo is Some(n) && n < 0
or inside an
if
expression it might end with athen
clause,if foo is Some(n) && n < 0 => -1 // ...
Simple multi-way patterns are like,
scrutinee is { pattern || pattern || … }
If there is a
consequent
then the patterns must all bind the same set of variables (if any) with the same types.More typically, a multi-way match will have
consequent
clauses, likescrutinee is {
pattern && consequent ||
pattern && consequent ||
=> otherwise }When a
consequent
is false, we go on to try other alternatives of the match, like we would when the first operand of boolean||
is false.To help with layout, you can include a redundant
||
before the first alternative.For example,
if foo is { || Some(n) && n < 0 => -1 || Some(n) && n > 0 => +1 || Some(n) => 0 || None => 0 }
Alternatively,
if foo is { Some(n) && ( n < 0 => -1 || n > 0 => +1 || => 0 ) || None => 0 }
(They should compile the same way.)
The evaluation model is like familiar shortcutting &&
and ||
and
the syntax is supposed to reinforce that intuition. The UCS paper
spends a lot of time discussing backtracking and how to eliminate it,
but penultimate conditionals evaluate straightforwardly from left to
right.
The paper briefly mentions as
patterns, like
Some(Pair(x, y) as p)
which in Rust would be written
Some(p @ Pair(x, y))
The is
operator doesn’t need a separate syntax for this feature:
Some(p is Pair(x, y))
For large examples, the penultimate conditional syntax is about as
noisy as Rust’s match
, but it scales down nicely to smaller matches.
However, there are differences in how consequences and alternatives
are punctuated which need a bit more discussion.
dangling syntax
The precedence and associativity of the is
operator is tricky: it
has two kinds of dangling-else problem.
The first kind occurs with a surrounding boolean expression. For
example, when b = false
, what is the value of this?
b is true || false
It could bracket to the left, yielding false
:
(b is true) || false
Or to the right, yielding true
:
b is { true || false }
This could be disambiguated by using different spellings for boolean or and pattern alternatives. But that doesn’t help for the second kind which occurs with an inner match.
foo is Some(_) && bar is Some(_) || None
Does that check foo is Some(_)
with an always-true look at bar
( foo is Some(_) ) && bar is { Some(_) || None }
Or does it check bar is Some(_)
and waste time with foo
?
foo is { Some(_) && ( bar is Some(_) ) || None }
I have chosen to resolve the ambiguity by requiring curly braces {}
around groups of alternative patterns. This allows me to use the same
spelling ||
for all kinds of alternation. (Compare Rust, which uses
||
for boolean expressions, |
in a pattern, and ,
between the
arms of a match
.)
Curlies around multi-way matches can be nested, so the example in the previous section can also be written,
if foo is {
|| Some(n) && n < 0 => -1
|| Some(n) && n > 0 => +1
|| { Some(0) || None } => 0
}
The is
operator binds tigher than &&
on its left, but looser than
&&
on its right (so that a chain of &&
is gathered into a
consequent
) and tigher than ||
on its right so that outer ||
alternatives don’t need extra brackets.
examples
I’m going to finish these notes by going through the ultimate conditional syntax paper to translate most of its examples into the penultimate syntax, to give it some exercise.
Here we use is
to name a value n
, as a replacement for the |> abs
pipe operator, and we use range patterns instead of split
relational operators:
if foo(args) is {
|| 0 => "null"
|| n && abs(n) is {
|| 101.. => "large"
|| ..10 => "small"
|| => "medium"
)
}
In both the previous example and the next one, we have some extra brackets where UCS relies purely on an offside rule.
if x is {
|| Right(None) => defaultValue
|| Right(Some(cached)) => f(cached)
|| Left(input) && compute(input) is {
|| None => defaultValue
|| Some(result) => f(result)
}
}
This one is almost identical to UCS apart from the spellings of and
,
then
, else
.
if name.startsWith("_")
&& name.tailOption is Some(namePostfix)
&& namePostfix.toIntOption is Some(index)
&& 0 <= index && index < arity
&& => Right([index, name])
|| => Left("invalid identifier: " + name)
Here are some nested multi-way matches with overlapping patterns and bound values:
if e is {
// ...
|| Lit(value)
&& Map.find_opt(value) is Some(result)
=> Some(result)
// ...
|| { Lit(value) ||
Add(Lit(0), value) ||
Add(value, Lit(0)) }
=> {
print_int(value);
Some(value)
}
// ...
}
The next few examples show UCS splits without the is
operator. In my
syntax I need to press a few more buttons but I think that’s OK.
if x == 0 => "zero"
|| x == 1 => "unit"
|| => "?"
if x == 0 => "null"
|| x > 0 => "positive"
|| => "negative"
if predicate(0, 1) => "A"
|| predicate(2, 3) => "B"
|| => "C"
The first two can be written with is
instead, but it’s not briefer:
if x is {
|| 0 => "zero"
|| 1 => "unit"
|| => "?"
}
if x is {
|| 0 => "null"
|| 1.. => "positive"
|| => "negative"
}
There’s little need for a split-anything feature when we have multi-way matches.
if foo(u, v, w) is {
|| Some(x) && x is {
|| Left(_) => "left-defined"
|| Right(_) => "right-defined"
}
|| None => "undefined"
}
A more complete function:
fn zip_with(f, xs, ys) {
if [xs, ys] is {
|| [x :: xs, y :: ys]
&& zip_with(f, xs, ys) is Some(tail)
=> Some(f(x, y) :: tail)
|| [Nil, Nil] => Some(Nil)
|| => None
}
}
Another fragment of the expression evaluator:
if e is {
// ...
|| Var(name) && Map.find_opt(env, name) is {
|| Some(Right(value)) => Some(value)
|| Some(Left(thunk)) => Some(thunk())
}
|| App(lhs, rhs) => // ...
// ...
}
This expression is used in the paper to show how a UCS split is desugared:
if Pair(x, y) is {
|| Pair(Some(xv), Some(yv)) => xv + yv
|| Pair(Some(xv), None) => xv
|| Pair(None, Some(yv)) => yv
|| Pair(None, None) => 0
}
The desugaring in the paper introduces a lot of redundant tests. I would desugar straightforwardly, then rely on later optimizations to eliminate other redundancies such as the construction and immediate destruction of the pair:
if Pair(x, y) is Pair(xx, yy) && xx is {
|| Some(xv) && yy is {
|| Some(yv) => xv + yv
|| None => xv
}
|| None && yy is {
|| Some(yv) => yv
|| None => 0
}
}
Skipping ahead to the “non-trivial example” in the paper’s fig. 11:
if e is {
|| Var(x) && context.get(x) is {
|| Some(IntVal(v)) => Left(v)
|| Some(BoolVal(v)) => Right(v)
}
|| Lit(IntVal(v)) => Left(v)
|| Lit(BoolVal(v)) => Right(v)
// ...
}
The next example in the paper compares C# relational patterns. Rust’s range patterns do a similar job, with the caveat that Rust’s ranges don’t have a syntax for exclusive lower bounds.
fn classify(value) {
if value is {
|| .. -4.0 => "too low"
|| 10.0 .. => "too high"
|| NaN => "unknown"
|| => "acceptable"
}
}
I tend to think relational patterns are the better syntax than ranges. With relational patterns I can rewrite an earlier example like,
if foo is {
|| Some(< 0) => -1
|| Some(> 0) => +1
|| { Some(0) || None } => 0
}
I think with the UCS I would have to name the Some(_)
value to be
able to compare it, which suggests that relational patterns can be
better than UCS split relational operators.
Prefix-unary relational operators are also a nice way to write
single-ended ranges in expressions. We could simply write both ends to
get a complete range, like >= lo < hi
or like
if value is > -4.0 < 10.0 => "acceptable"
|| => "far out"
Near the start I quoted a normalize
example that illustrates
left-aligned UCS expression. The penultimate version drifts right like
the Scala version:
if normalize(tp1) is {
|| Bot => Bot
|| tp1_n && normalize(tp2) is {
|| Bot => Bot
|| tp2_n && merge(tp1_n, tp2_n) is {
|| Some(tp) => tp
|| None => glb(tp1_n, tp2_n)
}
}
}
But a more Rusty style shows the benefits of early returns (especially
the terse ?
operator) and monadic combinators.
let tp1 = normalize(tp1)?;
let tp2 = normalize(tp2)?;
merge(tp1, tp2)
.unwrap_or_else(|| glb(tp1, tp2))
antepenultimate breath
When I started writing these notes, my penultimate conditional syntax was little more than a sketch of an idea. Having gone through the previous section’s exercise, I think it has turned out better than I thought it might.
The extra nesting from multi-way match braces doesn’t seem to be
unbearably heavyweight. However, none of the examples have bulky
then
or else
blocks which are where the extra nesting is more
likely to be annoying. But then, as I said before it’s comparable to a
Rust match
:
match scrutinee {
pattern => {
consequent
}
}
if scrutinee is {
|| pattern => {
consequent
}
}
The ||
lines down the left margin are noisy, but hard to get rid of
in the context of a curly-brace language. I can’t reduce them to |
like OCaml because what would I use for bitwise OR? I don’t want
presence or absence of flow control to depend on types or context. I
kind of like Prolog / Erlang ,
for &&
and ;
for ||
, but that’s
well outside what’s legible to mainstream programmers. So, dunno.
Anyway, I think I’ve successfully found a syntax that does most of what UCS does, but much in a much simpler fashion.