What Programming Languages Should You Know?

Date: Mar 9, 2007

Article is provided courtesy of Prentice Hall Professional.

David Chisnall posits that the more programming languages you know, the better. The point is not to stuff your head with language rules. Rather, he explains how being able to read multiple languages, even if you never code in them, can help you to select the best possible tool for each coding need — and understand the limitations of the tools you're using.

Learning foreign languages helps to broaden your mind because some concepts are much better expressed in one language than another. German, for example, really has no word for "fluffy," although it does have words for "furry" and "fuzzy." Similarly, English has no elegant way of expressing the difference between libre and gratis in Spanish. Natural languages adopt concepts from each other over the years and tend toward the same expressive abilities.

Programming languages are different. A programming language may be defined by a specification, but is always limited by the abilities of the compilers and interpreters that execute it. If a particular language doesn’t support a given feature, you can’t just steal a few words from another language and use them, as you might with a natural language. While you can combine languages in a single project, you typically have to do this across well-defined interfaces, rather than by just stealing some syntax from one language and using it in another.

This kind of thing isn’t just an abstract problem. If you’re writing code that queries a database, you’re probably going to be using at least two languages. Because most general-purpose languages don’t have semantics that map clearly onto a database, you’ll need to embed something like SQL or XPath. Now, imagine that you want to express a transitive closure in your query. SQL isn’t expressive enough to implement a transitive closure (although some vendor-specific extensions permit this), so you might have to implement it by using a stored procedure in a third language.

The ability to use different languages when they suit the task at hand is a sign of a good coder. The more languages you learn, the easier it is to pick up a new one. Eventually, you start thinking of every new language as just a set of modifications to a language you know already. So what languages should you learn that will help you to quickly build up the set of basic concepts and let you pick up other languages easily? The rest of this article contains my answer to this question. Note that I’m not necessarily advocating using any of these languages for a real project, but I believe that learning them will make you a better programmer in whatever language you do use.

C: Portable Assembly

Whatever language you choose for writing code, eventually that code will be turned into a set of very simple machine instructions. These instructions will be executed more or less sequentially (superscalar architectures notwithstanding). A good programmer needs to be able to think at multiple levels of abstraction simultaneously, and the best way of accomplishing this goal for the lower levels is to use a language that closely mimics processor operation.

A decade or two ago, the most sensible way of achieving this objective would be by learning assembly language. These days, however, few processors actually execute machine language instructions directly, so a language like C isn’t much different in terms of abstractions. A modern x86 CPU, for example, typically has some hidden registers used for storing items on the stack, so the items that an assembly language programmer believes are stored in registers are not quite the same as those that the CPU actually stores.

Being one layer of abstraction higher than an assembly language, C has the advantage that it’s not tied to a particular CPU architecture. This faculty has helped C to gain widespread use; even if you never write a line of C code, you undoubtedly will have to read C code at some point or other.

On VMS, there was a well-defined ABI for all procedural languages, so it was trivial to call one from another. Microsoft Windows had COM, which allowed all languages to deal easily with objects from other languages (as long as both languages were C++), and now we have .NET, which does the same thing (as long as both languages are C#). In the UNIX world, the standard way of exporting an interface to a library is via C, irrespective of which language you’re using. For a program in language A, calling a function in a library written in language B almost always involves going via C.

Smalltalk

Smalltalk is the archetypal object-oriented (OO) language. It was developed at Xerox PARC by a group including Alan Kay, who first coined the term object-oriented. The concept behind object-oriented programming is that you can simplify development of software if you split your general-purpose computer into a load of simple specialized computers (objects) that communicate by message-passing. This simplicity has been lost in a lot of later languages.

Smalltalk is a pure OO language: Everything is an object. Classes are a special kind of object used to create other objects. Even the messages passed to objects (the Smalltalk replacement for function calls in procedural languages) are themselves objects.

One of the most important reasons for learning Smalltalk is that the syntax is not C-like. Most recent languages have adopted C-like syntax to make themselves more appealing to a generation of C programmers. This fact has led to a widespread belief that C syntax is the "correct" way of designing a programming language. Smalltalk was developed a few years after C, but before C gained widespread adoption, and so didn’t inherit the syntax.

A statement in Smalltalk, like an imperative statement in a natural language, contains a subject, a verb, and optionally an object. Once you’re familiar with the syntax, one thing will strike you about Smalltalk: The language is so "small" that it’s almost nonexistent.

Smalltalk-the-language doesn’t even include conditional statements. In fact, the only kind of statement in Smalltalk is a message-passing statement. Conditionals are implemented in the library; you pass a message containing a block of code to a Boolean object, and it executes it if the Boolean object is true. The power of this approach is quickly obvious: Since all flow-control structures are part of the language, you can define your own very easily. Do you want a for-each control structure? Simply add a method to your collection class that takes a block of code as an argument and executes it with each object in the collection.

Lisp

Eric S. Raymond once said, "Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot." The defining feature of Lisp is the syntax—or, rather, the lack of syntax. When you compile any programming language, one of the first things your compiler will do is generate a syntax tree. In Lisp, you write the syntax tree yourself. As with Smalltalk, this approach gives you a great deal of flexibility. Your Lisp code is a simple data structure that the language is designed to process, meaning that you can easily write a Lisp program that takes a Lisp program and performs some translations on it.

Because this is such a useful thing to do, features built into the language make it easy. Lisp macros are very similar to functions, but they’re executed at compile time, allowing you to transform the syntax tree easily.

It’s often said that the correct way of writing code in Lisp is to write the problem and then write a domain-specific language that will compile the problem description. This is a very good habit to get into because, for a lot of classes of problem, something close to the domain-specific language already exists. Once you get used to the idea that each problem has a specific language in which it can be solved easily, you’ll be better at choosing the correct language for any task.

Erlang

A few years ago, Erlang wouldn’t have been on this list. It’s the youngest language listed here—by quite a large margin. Why is it here? Because it embodies a style of programming that will become more important as more computers employ a multi-core design.

Erlang is based on the communicating sequential processes (CSP) model of concurrency. It provides a mechanism for creating independent processes very cheaply and for passing messages between those processes. What separates Erlang from the other languages discussed thus far is that Erlang is built around asynchronous communication. C and Lisp functions—and even Smalltalk messages—typically execute synchronously; you start one and then wait for it to finish. Sending an Erlang message doesn’t block the sending process, so it can do something else while it waits for the process handling the message to complete.

This approach allows you to write very scalable code, without the pitfalls associated with threads. There’s no shared memory, so you never need to deal with locks. You can still introduce deadlock if you have two processes waiting for each other, but this situation is quite rare.

Haskell

No set of essential languages would be complete without a functional programming language, and Haskell is a very good example of this paradigm. As much as possible, functional languages dispense with the idea of global state. Unlike a procedure, a pure function will always return the same value when called with the same arguments; its result doesn’t depend on anything other than the arguments.

Like Erlang, Haskell is a language closely related to a theoretical model for computation. In this case, the model is ë-calculus. The operations in Haskell map directly to the reductions permitted by ë-calculus, making the language very popular with people interested in formal verification. You can easily map between a Haskell program and a set of ë-calculus terms, so anything you can prove about the terms also applies to the program.

For the most part, functions are not allowed to have side effects, and those that do are easy for the compiler to spot. If a function doesn’t have side effects, then the only constraint on when it can be executed is when you use the return value. If you never use the return value, you can avoid evaluating it at all (lazy evaluation). This property allows you to extract parallelism implicitly from functional programs. Consider the following pseudocode:

a = foo()
b = bar()

In a procedural language, foo() might alter some global state, and so the outcome of bar() might depend on foo(). In a pure functional language, this isn’t the case, so foo() and bar() can be executed concurrently.

Prolog

The final language on my list is a slightly unusual choice. All of the others are fairly flexible general-purpose languages. Prolog is not. While it is Turing-complete, and thus can be used to implement any algorithm, doing so is not always sensible.

Prolog is based on predicate logic, and is very good for building knowledge-based applications. Any system that collects arbitrary relations between objects (or concepts) and performs some inference or reasoning about them is a good fit for Prolog.

The real reason for learning Prolog, however, is that it’s about as unlike C as possible while still having a useful language. Once you can think in Prolog as easily as in an imperative language, you’ll find it very easy to adapt to new programming models.

Like functional languages, Prolog doesn’t allow global variables. In fact, it doesn’t allow assignment in the classical sense. Variables in Prolog can have only one value for their entire lifespan—a characteristic that Erlang inherited. The first Erlang implementation was written in Prolog, so the syntaxes of both are quite similar.

Any More?

This list is by no means exhaustive. Learning new languages almost always will make you a better programmer. The important thing to remember, however, is not to restrict yourself to languages that share too many concepts. If you know Java and C, learning C++ won’t stretch you.

It’s also worth remembering that you don’t have to use a language to benefit from knowing it. If you know assembly language for the processor on which you’re working, this knowledge will help you in writing high-level code, since you’ll be able to keep in mind what’s actually going on when you execute your code. Similarly, familiarity with a higher-level language will help you to write better-structured code in a lower-level language.

Eventually, someone might create a programming language that is the ideal choice for all uses. (Lisp programmers will argue that this has already happened.) Until then, however, the more languages you know, the easier it will be for you to select the correct tool for the job. More importantly, you’ll be better able to determine the suitability of the language you’re using.