ASCII and Unicode quotation marks

by Markus Kuhn

Summary: Please do not use the ASCII grave accent (0x60) as a left quotation mark together with the ASCII apostrophe (0x27) as the corresponding right quotation mark (as in `quote'). Your text will otherwise appear rather strange with most modern fonts (e.g., on Windows and Mac systems). Only old X Window System fonts and some old video terminals show ASCII 0x60/0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead. If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote'). If you can use Unicode characters, nice directional quotation marks are available in the form of characters U+2018, U+2019, U+201C, and U+201D (as in ‘quote’ or “quote”).

Background

The Unicode and ISO 10646 standards define the following characters:

U+0022QUOTATION MARK" neutral (vertical), used as opening or closing quotation mark; preferred characters in English for paired quotation marks are U+201C and U+201D
U+0027APOSTROPHE' neutral (vertical) glyph having mixed usage; preferred character for apostrophe is U+2019; preferred characters in English for paired quotation marks are U+2018 and U+2019
U+0060GRAVE ACCENT`  
U+00B4ACUTE ACCENT´  
U+2018LEFT SINGLE QUOTATION MARK‘  
U+2019RIGHT SINGLE QUOTATION MARK’ this is the preferred character to use for apostrophe
U+201CLEFT DOUBLE QUOTATION MARK“  
U+201DRIGHT DOUBLE QUOTATION MARK”  

ASCII and ISO 8859 were only designed to support the very restricted typographic style available to typewriter users. The two ASCII characters

0x22QUOTATION MARK"
0x27APOSTROPHE'

are supposed to represent the neutral (vertical) glyphs commonly used on typewriters. They should not be used as directional quotation marks.

ISO 8859 and Unicode fonts are supposed to show the two accent characters

0x60GRAVE ACCENT`
0xB4ACUTE ACCENT´

as mutually symmetric shapes.

The problem

Unfortunately, the X Window System fonts contained for a long time the following mutually symmetric glyphs:

0x27APOSTROPHE’
0x60GRAVE ACCENT‛

These shapes were even sanctioned by an early US version of the ISO 646 standard (ANSI X3.4, also known as ASCII), which defined 0x27 as “apostrophe (closing single quotation mark; acute accent)”, but they should already have been changed when the fonts were extended to cover ISO 8859-1, which added a separate acute accent at 0xB4. One obviously cannot have both 0x27/0x60 and 0x60/0xB4 as mutually symmetric glyph pairs and have at the same time a different shape for 0x27 and 0xB4. Since 0x60/0xB4 were defined to be accents by the modern standards, their symmetric shape got priority, except that this had not been fixed in the X fonts until 2004 (somewhat earlier in the versions that come with XFree86).

The old X fonts encouraged some authors of Unix software and documentation to abuse 0x60 together with 0x27 as directional quotation marks. This practice looked somewhat acceptable like

‛quotation’

if displayed with old X fonts, but it looked rather ugly like

`quotation'

in most other modern display environments (e.g., with the correctly designed Windows and Mac TrueType fonts, but also on many classic 1970s/1980s video terminals, such as those by Siemens/Nixdorf and many other manufacturers).

For example, 0x60 and 0x27 look under Windows NT 4.0 with the TrueType font Lucida Console (size 14) like this:

WinNT screenshot

Unicode and ISO 10646 make a very clear distinction between the undirected typewriter-style ASCII single quotation mark and apostrophe U+0027 as in

'quotation'

and the typographic directed quotation marks U+2018 and U+2019 as in

‘quotation’

Unicode 2.1 explicitly says that U+2019 is the preferred punctuation apostrophe, as in “We’ve been here before.” The Unicode standard also notes:

“For historical reasons, U+0027 is a particularly overloaded character. In ASCII it is used to represent a punctuation mark (such as right single quotation mark, left single quotation mark, apostrophe punctuation, vertical line, or prime) or a modifier letter (such as apostrophe modifier or acute accent.) (Punctuation marks generally break words; modifier letters generally are considered part of a word.) In many systems it is always represented as a straight vertical line and can never represent a curly apostrophe or right quotation mark.”

What to do?

If you are the author of some Unix software, then please check, whether you use the ASCII character 0x60 (`) as a left quotation mark as in `quote'. Change it such that you use instead the character 0x27 (') on both sides, as in 'quote'. If you work in an environment where the UTF-8 encoding is already used everywhere (e.g., Plan9 and most modern GNU/Linux installations), you could even decide to use proper directional quotation marks, as in ‘quote’ or “quote”.

Check your source code directories with

  grep \` *

to find out, where modifications are necessary. Then use (with proper care!) something like

  perl -pi.bak -e "s/\`/'/g;" file1 file2 ...

to make the necessary substitutions automatically, or make the edits manually instead.

The use of 0x60 (grave accent) as a special control character in the Unix shell (to denote command substitution as in `command` or better $(command)), in Perl, in Lisp, or in TeX/troff (to denote a proper left single quotation mark) does not have to be changed and remains unaffected. Donald Knuth’s TeXbook (chapter 2, page 3, end of second paragraph) has actually warned TeX users already since 1986 that the apostrophe and grave accent shapes can show up as required by ISO and Unicode and not as used in the rest of the TeXbook. The Unix m4 macro processor is probably the only widely used tool that uses the `quote' combination as part of its input syntax; however, even that could be modified via changequote.

Why should we fix this?

There are quite a number of reasons, why the old X fonts had to be fixed, and with them the associated ASCII backquote practice:

Updated X Window System core BDF fonts have been available since 1998, in which the apostrophe and grave accent are now corrected, along with a number of other bugs. They replaced the old fonts in XFree86 since version 4.0 and in the X.Org sample implementation since X11R6.8.

Related hints

PostScript

PostScript has a somewhat complicated history of how it maps the ASCII bytes to glyphs. In PostScript fonts, each glyph is identified not by a code position, but by a glyph name such as “quotesingle”. After the publication of the Unicode Standard, Adobe released an official PostScript Glyph Name to Unicode Mapping table. When a PostScript interpreter displays text, it uses an encoding vector to map the 8-bit byte values found in text strings onto the glyph names found in fonts.

Unicodeglyph
image
PostScript
glyph nameencoding vector
positionnameStdISOLatin1CE
U+0022QUOTATION MARK " quotedbl0x220x220x22
U+0027APOSTROPHE ' quotesingle0xA90x27
U+0060GRAVE ACCENT ` grave0xC10x910x60
U+00B4ACUTE ACCENT ´ acute0xC20x92/0xB40xB4
U+2018LEFT SINGLE QUOTATION MARK ‘ quoteleft0x600x600x91
U+2019RIGHT SINGLE QUOTATION MARK ’ quoteright0x270x270x92
U+201CLEFT DOUBLE QUOTATION MARK “ quotedblleft0xAA0x93
U+201DRIGHT DOUBLE QUOTATION MARK ” quotedblright0xBA0x94

PostScript provides several predefined 8-bit encoding vectors. Authors of printer drivers can easily add their own. As the above table shows, the original PostScript standard encoding followed a practice similar to the old X fonts, with all its problems, namely it mapped the ASCII bytes 0x60 and 0x27 to curly opening and closing quotation marks (“quoteleft” and “quoteright” in PostScript glyph-name terminology, or U+2018 and U+2019 in Unicode).

When ISO 8859-1 emerged, Adobe added to PostScript another predefined encoding vector called ISOLatin1Encoding. This was meant to be ISO 8859-1 compatible, but it remained at 0x60 and 0x27 unchanged from the old StandardEncoding vector, and therefore it does not actually print the ISO 8859-1 characters 0x27 and 0x60 correctly, which correspond to Unicode characters U+0027 and U+0060 and should be represented by the PostScript glyphs “grave” and “quotesingle”. The authors of Adobe’s PostScript Language Reference, Third Edition (Addison-Wesley, ISBN 0-201-37922-8) acknowledge this in section E.5, footnote 3, page 783, where they note that the “ISOLatin1Encoding encoding vector deviates from the ISO 8859-1 standard” and that an application that wants to “conform exactly to the ISO standard should create a modified encoding vector”. The newer CE encoding vector (Central European, matching Windows CP1250), which is now also described in the PostScript Language Reference, correctly maps 0x27 to “quotesingle” and 0x60 to “grave”.

If you write a PostScript driver, please use the official Unicode to PostScript mapping table to map ASCII, ISO 8859 and ISO 10646 characters to PostScript glyphs, as the updated Type 1 renderer in XFree86 4.0 does. Do not use the ISOLatin1Encoding encoding vector to print ISO 8859-1 text, without changing it first to map 0x27 to “quotesingle” and 0x60 to “grave”. (In addition, you may also want to map 0x2D = HYPHEN-MINUS to the PostScript glyph “hyphen” instead of the “minus” mapping used by ISOLatin1Encoding).

TeX

The font cmtt10 in TeX’s Computer Modern family follows the example of the PostScript standard encoding by providing a straight double quotation mark and directional single quotation marks on the ASCII positions 0x22, 0x60, and 0x27. It also provides a straight single quotation mark, grave accent, and acute accent on code positions 0x0d, 0x12, and 0x13, respectively, but it lacks directional double quotation marks:

U+0022 QUOTATION MARK" "
U+0027 APOSTROPHE\char"0D '
U+0060 GRAVE ACCENT\char"12 `
U+00B4 ACUTE ACCENT\char"13 ´
U+2018 LEFT SINGLE QUOTATION MARK ` ‘
U+2019 RIGHT SINGLE QUOTATION MARK ' ’

Therefore, to demonstrate the result of abusing ASCII’s straight quotation mark and graph accent as directional quotation marks in a document written in LaTeX, you can write \texttt{\char"12 quote\char"0D}. The non-typewriter fonts in Computer Modern lack both single and double straight quotation marks.

Use LaTeX’s upquote package (\usepackage{upquote}) to map in the verbatim modes the ASCII characters 0x27 and 0x60 to the correct glyphs.

References

created 1999-12-19 – last modified 2007-12-11 – http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html