The lingua franca of LaTeX

The lingua franca of LaTeX

How Donald Knuth’s 1978 typesetting program became one of the oldest still-active open-source projects and revolutionized technical publishing along the way.
Part of
Issue 9 May 2019

Open Source

In 1976, Stanford University computer science professor Donald Knuth experienced firsthand a problem that had been plaguing the scientific community in recent years: how to ensure that research papers that included mathematical equations and scientific notation were formatted correctly for printing.

Manual typesetting could produce excellent output, but it was expensive. If an author wanted to write mathematical or other technical notation, the author and a knowledgeable typesetter needed to work closely together.

For the 1968 edition of his book The Art of Computer Programming, Knuth had provided thousands of handwritten manuscript pages to his publisher, Addison-Wesley, which used traditional metal equipment to typeset it. The book contained examples of computer programs, and Knuth explained to the typesetters in detail how he wanted these texts to look. He was happy with the results of that first edition. But eight years later, for the second edition of the expanded book’s second volume, Addison-Wesley had transitioned to cheaper, electronic typesetting methods, and Knuth found the galley proofs they sent him to be subpar.

Knuth knew that high-resolution digital typesetting machines could produce precise shapes, but the expertise of professional typesetters had been lost in the transition. What if, he wondered, computer software could do the same job? In 1977, Knuth spent the summer as well as part of his sabbatical year working on just such a digital typography program. With his students, he was able to write a program capable of typesetting the entire 700-page revised volume of his book by 1978. The program, called TeX, revolutionized how scientific papers are formatted and printed. It’s also one of the oldest OSS projects still in use.

An open book

TeX enabled authors to encode their precise intent into their manuscripts: This block of text is a computer program, while this word is a keyword in that program. The language it used, called TeX markup, formalized the slow, error-prone communication that is normally carried out with the printer over repeated galley proofs.

To use TeX, the author writes in a plain-text document with no formatting and inserts TeX’s markup language commands directly into the material. For example, this text:

<pre><code>The quadratic formula is
$$
<span class='hljs-string'><span class='hljs-function'>-b</span> \pm \sqrt<span class='hljs-function'>{b^</span>2 <span class='hljs-function'>-</span> 4<span class='hljs-function'>ac}</span>\over <span class='hljs-function'>{</span>2<span class='hljs-function'>a}</span></span>
$$</code></pre>

produces this output:

The idea of writing markup inside text wasn’t especially novel; it has been used from 1970’s runoff (the UNIX family of printer-preparation utilities) to today’s HTML tags. TeX was new in that it captured key concepts necessary for realistic typesetting and formalized them.

Knuth defined hundreds of typesetting commands. The typesetter that understands these commands is the TeX engine itself, which reads the author’s manuscript and produces a file that can be sent directly to a printer to produce typeset documents.

In the scheme of things, the TeX engine understands only a small set of primitive typesetting commands. Knuth also knew that no matter how many commands it understood, TeX could never hope to cover every possible typesetting need for authors. So he wrote TeX to be extensible by users. TeX can be given macros that define new commands in terms of existing commands, meaning people can create macro files to extend TeX without changing the TeX engine.

Knuth wrote it to be sufficient for typesetting his own writing. In 1984, he published The TeXbook, detailing the meanings of all of TeX’s primitive and plain macro commands as well as the inner workings of the engine. The book was intended to enable developers to write their own macros and to make changes to the code.

On Knuth’s insistence, the TeX engine source code has been free for over 40 years, and anyone can modify it. But to protect authors’ manuscripts from breaking, Knuth forbade incompatible changed versions of the engine from being called TeX. He published a set of automated tests that must be passed by any modified version of the engine before it can use the name.

TeX became a multiyear collaborative project spreading far beyond the university, as it had to be ported to different computer systems, languages, and printers to typeset manuscripts from other authors. The American Mathematical Society (AMS) in Rhode Island was an early adopter and promoter, and a periodic meeting of TeX users was organized in 1980 to collect user feedback and direct the development of TeX. The TeX Users Group (TUG) was instrumental in packaging all of the documentation and programs needed to run TeX into a standard distribution, TeX Live. Knuth also frequently contributed to the TUG publication, TUGboat, to answer in-depth questions and share his views on the future of the program.

Knuth revised TeX in 1982 (as TeX82) and again in 1989. Then, in 1992, he retired to focus on finishing the always-expanding book that had been TeX’s impetus: The Art of Computer Programming. But the work continued.

Easy macros

Most users of word processors are happy with only a few dozen commands, like section, subsection, bold, italic, and so on. They would rather the engine made as many of the typesetting decisions as possible.

Because TeX’s plain set of commands is fairly low-level, its best suited for authors who, like Knuth, desire detailed control over the typesetting of their material. But the idea of a high-level, easy-to-use markup language that separates the content from its presentation spurred the creation of Scribe, a program invented in 1980 by Brian Reid at Carnegie Mellon for his doctorate. Scribe gained popularity among technical authors in the early eighties, but it was not free, which limited its circulation.

One of the users of Scribe was Leslie Lamport at SRI International. Lamport was already a famous computer scientist in the field of distributed systems, and he was writing a book. He was one of the early users of TeX and wanted to bring the ease of use of Scribe to TeX. He created an alternative macro file called lplain (“l” for Lamport), with a set of much-easier-to-use commands. He gave this macro file away for free along with TeX. Users then could run a latex program which made the engine read Lamport’s macros first. Lamport also wrote a book, LaTeX: a Document Preparation System, to teach authors how to prepare their manuscripts using these commands. For more complex needs, authors could still use the underlying TeX commands.

With these higher-level commands, the free TeX engine, and the LaTeX book, the use of TeX exploded. The macro file has since evolved and changed names, but authors still typically run the program called latex or its variants. Hence, most people who write TeX manuscripts know the program as LaTeX and the commands they use as LaTeX commands.

The effect of LaTeX on scientific and technical publishing has been profound. Precise typesetting is critical, particularly for conveying concepts using chemical and mathematical formulas, algorithms, and similar constructs. The sheer volume of papers, journals, books, and other publications generated in the modern world is far beyond the throughput possible via manual typesetting. And TeX enables automation without losing precision.

Thanks to LaTeX, book authors can generate camera-ready copy on their own. Most academic and journal publishers accept article manuscripts written in LaTeX, and there’s even an open archive maintained by Cornell University where authors of papers in physics, chemistry, and other disciplines can directly submit their LaTeX manuscripts for open viewing. Over 10,000 manuscripts are submitted to this archive every month from all over the world.

LaTeX’s easy adaptability continues to win ardent followers. “I adore LaTeX because I can focus on writing instead of spending hours twiddling with getting the style just right,” says Mika McKinnon, a disaster researcher and geophysicist who started using LaTeX while writing her master’s thesis at the University of British Columbia. “If I do a good job with how I set up my LaTeX files, I can format and repurpose for any publication or style guide with minimum manual fixes.”

Norbert Preining, a TeX user and fan in Japan, has even launched a Facebook group for users. “We are not yet sure how effective it will be, but I think of it as making information more accessible in different channels than the ones we classically use, such as mailing lists and printed information,” Preining says.

Small but mighty

While modern desktop publishing has benefited greatly from innovations first introduced by TeX decades ago, it has also largely taken over the typesetting of nontechnical material. In fact, most people have never heard of TeX.

As free open-source software supported directly by its users, TeX doesn’t have any marketing push behind it. Since no one is making money off of it, there is no commercial incentive to make it popular. But that alone isn’t the whole story.

Because the TeX typesetting engine was developed before digital formats for typefaces had been standardized, Knuth had to invent his own format for font files; design a set of font families, Computer Modern; and invent his own typeface design program, Metafont. Along with TeX, Knuth also gave away his fonts, so that people could legally exchange documents without getting sued for using proprietary fonts.

The tech industry, meanwhile, was inventing, evolving, and protecting its own font formats: Adobe Type 1 and Type 3 (popularized in 1985 with the first PostScript printer), TrueType by Apple (1991), TrueType Open by Microsoft (1994), and OpenType by Microsoft and Adobe (1996). Even as TeX became successfully embedded in more and more manuscripts, the industry continued to evolve its designs in a different direction. To support the new font formats, TeX had to be modified.

The disconnect between technical or scientific and nontechnical authors is also fundamental to understanding TeX’s mainstream obscurity: In nontechnical publishing, typesetting is usually not essential for conveying the author’s intent. Typesetting is considered ornamental; authors of popular material are content to send a Word document to their publisher and let professionals do the rest. Technical authors, on the other hand, need to convey their meaning precisely through glyphs, sizes, and placement. TeX lets them do that, as well as exchange their documents in a widely understood format.

Being a command-line tool, TeX is easy to integrate into automation workflows. A user comfortable with TeX can prepare their material once and then format it multiple times for different publications with their own formatting guidelines.

For many users, a practical difficulty with typesetting using TeX is preparing the manuscripts. When TeX was first developed, technical authors were accustomed to using plain-text editors like WordStar, vi, or Emacs with a computer keyboard. The idea of marking up their text with commands and running the manuscript through a typesetting engine felt natural to them.

Today’s typesetters, particularly desktop publishers, have a different mental model. They expect to see the output in graphical form and then to visually make edits with a mouse and keyboard, as they would in any WYSIWYG program. They might not be too picky about the quality of the output, but they appreciate design capabilities, such as the ability to flow text around curved outlines. Many print products are now produced with tools like Microsoft Word for this very reason. TeX authors cannot do the same work as easily.

Meanwhile, vendors of commercial typesetting and design products have every incentive to tout their own wares instead of a free program. Early on, their products incorporated some of TeX’s free algorithms: in particular, line-breaking and hyphenation added by Knuth’s students. Frank Liang wrote his doctoral thesis on how to hyphenate words in typesetting paragraphs; his code is in the TeX engine and is freely available.

Dave Walden, an active member of TUG, points out that “the world of TeX has always been somewhat out of sync with the commercial desktop publishing systems that began to come out in the mid-1980s and are now ubiquitous in the worlds of publishing and printing.”

He adds, “Maybe it’s like the difference between starting a company to sell a commercial project and starting an eventually major open-source development project, although the phrase ‘open source’ didn’t exist when Knuth set things up as he did.”

What’s next?

In 1991, the LaTeX Project formally took over the maintenance and development of LaTeX 2.09 from Leslie Lamport. The team, led by Frank Mittelbach in Germany, is a core group of dedicated developers who greatly extended and effectively rewrote the LaTeX kernel, a core infrastructure now used by many packages that are continuously being added by developers around the world. LaTeX2e, a new version of LaTeX, was released in the early nineties, and is the version installed and used as LaTeX today.

Meanwhile, in responding to user requirements, the volunteer developers on TeX and related programs have extended the TeX engine itself to make changes easier. Multiple independent efforts have produced three modified engines: to read TrueType and OpenType fonts directly (XeTeX), to produce PostScript and PDF output (in earlier years, using a separate program called dvips, and now through an extension called pdfTeX), and to read Unicode input (LuaTeX, as well as XeTeX). LuaTeX allows developers to write powerful extensions to the TeX engine in the programming language Lua. And, with desktop computers that are orders of magnitude faster than they were in the 1990s, it’s quite possible to make substantial extensions. These three slightly modified engines are now in use. A longer-term project called LaTeX3, Mittelbach adds, is underway.

What lies in store for one of the oldest open-source projects still active?

“LaTeX will stay with us for a long time, largely due to the quality of the underlying TeX engine, which even through its 40 years has not been surpassed by any other software,” Mittelbach says. “In scientific writing, it remains the lingua franca for generations of students and researchers.”

About the author

Poornima Apte is an award-winning writer and editor. She’s happiest when her stash of books resembles a Jenga pile.

@booksnfreshair

Artwork by

Joel Plosz

joelplosz.com

Buy the print edition

Visit the Increment Store to purchase print issues.

Store

Continue Reading

Explore Topics

All Issues