.@ Tony Finch – blog


PEGs are an interesting new formalism for grammars. Unlike most older formalisms, which are based on Chomsky’s generative grammars, their starting point is recognizing strings that are in a language, not generating them. As such they are a closer match to what we usually want a grammar for. The practical effect of this is that they naturally avoid common ambiguities without external rules, such as C’s if/else ambiguity or the various rules about greediness imposed on regexes (e.g. perl’s matching rules versus POSIX’s longest-leftmost rule, discussed in Friedl’s book). Even though PEGs can recognize some non-context-free languages (e.g. anbncn) they can be matched in linear time using a packrat parser (which can be implemented very beautifully in Haskell).

Bryan Ford’s 2004 POPL paper establishes the formal foundations of PEGs and defines a concrete syntax for them, fairly similar to ABNF. The key differences are: the choice operator is ordered (prefers to match its left-hand sub-expression); repetition operators are maximally greedy and don’t backtrack (so the second a in a*a can never match); and it includes positive and negative lookahead assertions of the form &a and !a (like (?=a) and (?!a) in perl).

It occurs to me that there is a useful analogy hidden in here, that would be made more obvious with a little change in syntax. Instead of a / b write a || b, and instead of &a b write a && b. With || and && I am referring to C’s short-cutting “orelse” and “andalso” boolean operators - or rather the more liberal versions that can return more than just a boolean, since a PEG returns the amount of input matched when it succeeds. This immediately suggests some new identities on grammars based on De Morgan’s laws, e.g. !a || !b === !(a && b). Note that !!a =/= a because the former never consumes any input, so not all of De Morgan’s laws work with PEGs. This also suggests how to choose the operators to overload when writing a PEG parser combinator library for C++ (which has a much wider range of possibilities than Lua).