PEGs are an interesting new formalism for grammars. Unlike most older formalisms, which are based on Chomsky’s generative grammars, their starting point is *recognizing* strings that are in a language, not *generating* them. As such they are a closer match to what we usually want a grammar for. The practical effect of this is that they naturally avoid common ambiguities without external rules, such as C’s if/else ambiguity or the various rules about greediness imposed on regexes (e.g. perl’s matching rules versus POSIX’s longest-leftmost rule, discussed in Friedl’s book). Even though PEGs can recognize some non-context-free languages (e.g. a^{n}b^{n}c^{n}) they can be matched in linear time using a packrat parser (which can be implemented very beautifully in Haskell).

Bryan Ford’s 2004 POPL paper establishes the formal foundations of PEGs and defines a concrete syntax for them, fairly similar to ABNF. The key differences are: the choice operator is *ordered* (prefers to match its left-hand sub-expression); repetition operators are maximally greedy and don’t backtrack (so the second `a` in `a*a` can never match); and it includes positive and negative lookahead assertions of the form `&a` and `!a` (like `(?=a)` and `(?!a)` in perl).

It occurs to me that there is a useful analogy hidden in here, that would be made more obvious with a little change in syntax. Instead of `a / b` write `a || b`, and instead of `&a b` write `a && b`. With || and && I am referring to C’s short-cutting “orelse” and “andalso” boolean operators - or rather the more liberal versions that can return more than just a boolean, since a PEG returns the amount of input matched when it succeeds. This immediately suggests some new identities on grammars based on De Morgan’s laws, e.g. `!a || !b === !(a && b)`. Note that `!!a =/= a` because the former never consumes any input, so not all of De Morgan’s laws work with PEGs. This also suggests how to choose the operators to overload when writing a PEG parser combinator library for C++ (which has a much wider range of possibilities than Lua).