As is typical for static site generators, each page on this web site is generated from a file containing markdown with YAML frontmatter.
Neither markdown nor YAML are good. Markdown is very much the worse-is-better of markup languages; YAML, on the other hand, is more like better-is-worse. YAML has too many ways of expressing the same things, and the lack of redundancy in its syntax makes it difficult to detect mistakes before it is too late. YAML’s specification is incomprehensible.
But they are both very convenient and popular, so I went with the flow.
multiple documents
A YAML stream may contain several independent YAML
documents delimited by ---
start and ...
end markers,
for example:
---
document: 1
...
---
document: 2
...
string documents
The top-level value in a YAML document does not have to be an array or object: you can use its wild zoo of string syntax too, so for example,
--- |
here is a preformatted
multiline string
frontmatter and markdown
Putting these two features together, the right way to do YAML frontmatter for markdown files is clearly,
---
frontmatter: goes here
...
--- |
markdown goes here
The page processor can simply:
- feed the contents of the file to the YAML parser
- use the first document for metadata
- feed the second document to the markdown processor
- check that’s the end of the file
No need for any ad-hoc hacks to separate the two parts of the file: the YAML acts as a lightweight wrapper for the markdown.
markdown inside YAML
The crucial thing that makes this work is that the markdown after the
--- |
delimiter does not need to be indented.
Markdown is very sensitive to indentation, so all the tooling (most importantly my editor) gets righteously confused if markdown is placed in a container that introduces extra indentation.
YAML in Perl
The static site generator for www.dns.cam.ac.uk
uses --- |
to mark the start of the markdown in its source files.
This worked really nicely.
The web site was written in Perl, because most of the existing DNS
infrastructure was Perl and I didn’t want to change programming
languages. YAML was designed by Perl hackers, and the Perl YAML
modules are where it all went wrong started.
YAML in other languages
The static site generator for https://dotat.at is written in Rust,
using serde-yaml
.
I soon discovered that, unlike the original YAML implementations,
serde-yaml
requires top-level strings following --- |
to be
indented. This bug seems to be common in YAML implementations for
languages other than Perl.
start and end delimiters
So I changed the syntax for my frontmatter so it looks like,
---
frontmatter: goes here
...
markdown goes here
That is, the file starts with a complete YAML document delimited by
---
start and ...
end markers, and the rest of the file is the
markdown.
The idea is that a page processor should be able to:
- feed the contents of the file to the YAML parser
- read one document containing the metadata
- feed the rest of the file to the markdown processor
However, I could not work out how to get serde-yaml
to read just the
prefix of a file successfully and return the remainder for further
processing.
I know, I'll use regexps
(Might as well, I’m already way past two problems…)
As a result I had to add a bodge to the page processor:
- split the file using a regex
- feed the first part to the YAML parser
- feed the second part to the markdown processor
mainstream frontmatter
My choice to mark the end of the frontmatter with the YAML ...
end
delimiter is not entirely mainstream. As I understand it, the YAML +
markdown convention came from Jekyll, or at least Jekyll
popularized it. Jekyll uses the YAML ---
start delimiter to mark the
end of the YAML, or maybe to mark the start of the markdown, but
either way it doesn’t make sense.
Fortunately my ...
bodge is compatible with Pandoc YAML
metadata, and Emacs markdown mode
supports Pandoc-style YAML metadata, so the road to hell is at least
reasonably well paved.
grump
It works, but it doesn’t make me happy. I suppose I deserve the consequences of choosing technology with known deficiencies. But it requires minimal effort, and is by and large good enough.