Unicode, the reading list

by David Jones

A few (good) Unicode articles have been going round recently. I've gathered them and a couple of others here, in a sort of reading list. I've tried to put them into a recommended reading order. Enjoy.

2003. Spolsky's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)". A classic.

2015? Zentgraf's "What every programmer absolutely, positively needs to know about encodings and character sets to work with text" offers quite a fundamental view and some reflections on PHP (which I think should be entirely comprehesible to visitors from other nearby lands).

2017. Reed's "A Programmer’s Introduction to Unicode" recently came round on Twitter and has a lovely heatmap showing the areas of the Unicode space that are used. Also has an introduction to: surrogate pairs, for 16-bit encodings; and, combining marks, which create complex grapheme clusters.

2017. New's "poo.length === 2". A modern Hacker News firefly. Musings on the consequences of JavaScript using an internal 16-bit encoding. Share the descent into the rabbit hole of Emoji.

2016. Judis's "Emoji.prototype.length — a tale of characters in Unicode" builds a city in the Emoji rabbit hole. You could spend your whole life here.