.@ Tony Finch – blog


As is typical for static site generators, each page on this web site is generated from a file containing markdown with YAML frontmatter.

Neither markdown nor YAML are good. Markdown is very much the worse-is-better of markup languages; YAML, on the other hand, is more like better-is-worse. YAML has too many ways of expressing the same things, and the lack of redundancy in its syntax makes it difficult to detect mistakes before it is too late. YAML’s specification is incomprehensible.

But they are both very convenient and popular, so I went with the flow.

multiple documents

A YAML stream may contain several independent YAML documents delimited by --- start and ... end markers, for example:

    ---
    document: 1
    ...
    ---
    document: 2
    ...

string documents

The top-level value in a YAML document does not have to be an array or object: you can use its wild zoo of string syntax too, so for example,

    --- |
    here is a preformatted
    multiline string

frontmatter and markdown

Putting these two features together, the right way to do YAML frontmatter for markdown files is clearly,

    ---
    frontmatter: goes here
    ...
    --- |
    markdown goes here

The page processor can simply:

No need for any ad-hoc hacks to separate the two parts of the file: the YAML acts as a lightweight wrapper for the markdown.

markdown inside YAML

The crucial thing that makes this work is that the markdown after the --- | delimiter does not need to be indented.

Markdown is very sensitive to indentation, so all the tooling (most importantly my editor) gets righteously confused if markdown is placed in a container that introduces extra indentation.

YAML in Perl

The static site generator for www.dns.cam.ac.uk uses --- | to mark the start of the markdown in its source files. This worked really nicely.

The web site was written in Perl, because most of the existing DNS infrastructure was Perl and I didn’t want to change programming languages. YAML was designed by Perl hackers, and the Perl YAML modules are where it all went wrong started.

YAML in other languages

The static site generator for https://dotat.at is written in Rust, using serde-yaml.

I soon discovered that, unlike the original YAML implementations, serde-yaml requires top-level strings following --- | to be indented. This bug seems to be common in YAML implementations for languages other than Perl.

start and end delimiters

So I changed the syntax for my frontmatter so it looks like,

    ---
    frontmatter: goes here
    ...
    markdown goes here

That is, the file starts with a complete YAML document delimited by --- start and ... end markers, and the rest of the file is the markdown.

The idea is that a page processor should be able to:

However, I could not work out how to get serde-yaml to read just the prefix of a file successfully and return the remainder for further processing.

I know, I'll use regexps

(Might as well, I’m already way past two problems…)

As a result I had to add a bodge to the page processor:

mainstream frontmatter

My choice to mark the end of the frontmatter with the YAML ... end delimiter is not entirely mainstream. As I understand it, the YAML + markdown convention came from Jekyll, or at least Jekyll popularized it. Jekyll uses the YAML --- start delimiter to mark the end of the YAML, or maybe to mark the start of the markdown, but either way it doesn’t make sense.

Fortunately my ... bodge is compatible with Pandoc YAML metadata, and Emacs markdown mode supports Pandoc-style YAML metadata, so the road to hell is at least reasonably well paved.

grump

It works, but it doesn’t make me happy. I suppose I deserve the consequences of choosing technology with known deficiencies. But it requires minimal effort, and is by and large good enough.