.@ Tony Finch – blog


For a while I have had a fantasy programming language syntax idea that I have been failing to write up. This week I found out that it has already been implemented twice, which is cheering news :-)

The main inspiration is the special support for regular expression literals in languages such as Perl and Javascript. It's annoying that regexes are privileged with special syntax, but library writers can't define domain-specific syntax for their own purposes.

In fact Perl has several special literal syntaxes: single-quotes and q{} for verbatim strings; double quotes and qq{} for backslash-escaped interpolated strings; backticks and qx{} for shell commands; qw{} for word lists; slashes, m{}, s{}{}, and qr{} for regular expressions; tr{}{} for character substitution; and << for "here" documents.

The D programming language also has some extra flavourful string literals: r"" or `` for verbatim strings; "" for backslash-escaped strings; x"" for hex-encoded data; and bare backslash escape sequences.

What I'd like to be able to do is define a library for handling my special syntax. It would work as a plugin for the compiler that would parse and check my special literals at compile time (no run-time syntax errors!) and emit code that implements their special semantics. This framework means that, instead of being built into the language, libraries can provide features like converting backslash escape sequences into control characters, turning interpolated strings into a series of concatenations, arranging for regular expressions to be compiled once, and so forth.

You could then provide support for (say) XML literals, XPath expressions, better pattern matching syntax, or whatever else you might fancy.

One possibility that is very enticing is to make string interpolation context-aware, so that interpolated strings can be automatically escaped properly. The mixture of languages and syntaxes in a web page makes this fiendishly complicated. Different SQL engines have different escaping requirements. If this kind of hard-won knowledge can be implemented once in a library, security vulnerabilities such as cross-site scripting and SQL injection would be easier to avoid. The Caja project includes a proposal for secure string interpolation in Javascript which explains these issues very well.

The syntax I had in mind is inspired by Perl (and D's letter prefixes are along similar lines), for example, re$/^ *#/ is a regular expression for matching comment lines. A literal starts off with an identifier that specifies the literal's compiler. These identifiers have their own namespace so they can be very terse without clashing with variable names. The identifier is followed by a $ which indicates this is a string. The contents of the literal are delimited in the same way that Perl's generic literals, either with nesting {} () <> [] brackets or with matching punctuation. There's no way to escape delimiters within the literal, so that the literal can be passed unmodified to its compiler. For longer literals, or if single character delimiters are awkward, you can use $$ instead of $ and the rest of the line forms the delimiter, a bit like a here document. A literal compiler may or may not support interpolation, and defines its own syntax for doing so.

This week I was pleased to find out that the Glasgow Haskell Compiler supports my generalized literal idea. They call it "quasiquoting" after Lisp. The syntax it supports looks like [$re|^ *#|] though this is likely to change to [re|^ *#|]. The brackets are based on Template Haskell's [| |] quotation brackets. The quasiquoting paper has lots of good rationale for the feature and great examples of how easy Haskell makes it to write certain kinds of literal compilers. It also allows literal compilers to define their own interpolation syntax.

The E programming language has a feature called "quasi-literals" which look like rx`^ *#` with back-ticks for delimiters. The secure string interpolation document criticizes them for being executed at run-time, not compile time, so perhaps they aren't quite what I have in mind; also I hope they are wrong to say that the feature can only compile quasi-literals to parse trees. The E documentation is sparse so it's hard to tell.

I should also mention Lisp here, since it's the king of metaprogramming. I expect you could implement a feature like this using Common Lisp reader macros...

Do any other languages have a feature like this?