Syntax Matters

Posted: April 29, 2008 | Author: Alan Keefer | Filed under: Development |19 Comments

(Note: I had been working on this post before this thread showed up on Artima today, so I figured it was an appropriate time to finish it off and publish it.)

One of Carson’s favorite phrases is to say that “syntax matters,” so I’m kind of stealing his idea here. But it’s something I firmly believe in as well, and it’s central enough to how we’ve designed GScript that it warrants a detailed explanation and defense.

The most common response to any sort of language criticism always seems to be, “But you can still do that in my language, here’s how.” But of course that response is pretty much always an option; any Turing-complete language has the same set of fundamental capabilities. The point is that it doesn’t just matter what your language can do; what matters is how you do things using the language. In the end it’s impossible to be truly objective and judgments will depend on individual taste, which is fine: I’m not trying to convince anyone that this or that way is better right now, but rather just that there are such things as better and worse ways to accomplish something and that the difference matters.

Of course, on the other hand you can take things too far; the overhead of learning new syntactic structures can be pretty high, so in my opinion it’s not worth jamming everything possible into the language in the name of greater expressivity. With that said, this entry will focus more on reasons why clean, expressive syntax is important, and the subject of how to keep things from going too far will be a different entry. I’m fairly certain that my analysis is incomplete here, but I’ve broken down my argument into 4 main reasons why the syntax of a language and how you express things matters.

Lines Of Code

All things being equal, less code is almost always better. In real life, of course, things are never really equal, but as a general rule I hope it’s non-controversial to say that being able to do task X with 50 lines of code is preferable to needing 500 lines of code to do task X. Less code takes longer to write, but the real benefits are around maintenance: less code means less of a chance of bugs, less to keep in your head, less for someone else (or yourself 6 months later) to read through and learn, less to test, and less to modify when you change the rest of the system.

There are always exceptions of course; 50 lines of incomprehensible code is probably not preferable to 200 lines of dead-simple, straight-line code, and 50 lines of highly-coupled code might not be preferable to 6 different buckets of 50 lines of independent, decoupled code. But from a language design perspective, reducing the amount of code a programmer needs to write is generally the right thing to do, and the fact that languages like Ruby and Python require so much less code than Java is one main reason why people generally end up being more productive in those languages.

Readability

I think it’s fair to say that code is often harder to read than it is to write, and it’s certainly true that code is read many more times than it’s written. As a result, writing readable code is critically important to any development project. Better syntax makes code more readable by more clearly expressing the intent of the author. For example, something like:

var userNames = users.map(\u -> u.Name)

is clearer to me than

List<String> userNames = new ArrayList<String>();
for (User user : users) {
  userNames.add(user.getName())
}

It’s not just a lines-of-code issue, it’s the fact that in the first case you read the word “map” and it immediately conveys a large amount of information: that the developer wanted to extract a list of names, that the operation is non-destructive, etc. The second case requires slightly more work to understand because it looks like every other for loop in Java, so you have to dig more into the details to realize what it does (“okay . . . we’re iterating here to do a simple mapping, not to perform an operation on each element, partition the list up, transform the list in place”). Of course, you can write readable code in just about any language, and in the Java case you could always refactor it into a helper method, or use some functional-like library with an anonymous inner class. The point of better syntax is often that it makes it easier to write readable code; the fact that there’s only one obvious way to do most things in Python, for example, tends to make Python code much more readable than Perl code (at least in my opinion). You can write readable code in Perl, but you have to work at it; the language itself makes it difficult. Better syntax makes it easier to write readable code, which means that the code written in that language will, on average, be more readable.

Memorability

One thing that I think people often don’t pay enough attention to is how easy it is to remember how to do things. The key metric is how often you have to use something in order to stop having to look it up or rely on an auto-complete treasure-hunt in the IDE. Are things so obvious that, even though you don’t exactly remember, your first guess is generally right? If so, the designer did a good job. Do you have to run off to the internet if you go more than 2 days without writing code using that syntax? That’s generally a bad sign.

Memorability plays into both reading and writing code; obviously it’s hard to read code if you don’t remember what the function calls mean or how the syntax elements interact, and it’s clearly laborious to write code if you have to constantly look in a reference guide.

Good syntax will cause things to stick better in your head, whereas less-clear syntax might make it nearly impossible to remember things. For example, XPath just refuses to stick in my brain; I don’t use it often enough, and while it’s powerful, the syntax is basically arbitrary as far as I can tell, which means that whenever I try to write XPath expressions or need to read someone else’s I have to look things up, slowing me down immensely. That’s the primary reason I didn’t add XPath support to GScript’s XML library; I’d much rather use closures with findFirst() or findAll() methods, since the intention is unambiguous and the extra code is more than made up for by the hours I don’t have to spend re-learning XPath every time I need to do something. Is it useful to have a consistent, declarative way to query XML trees? Sure, in cases where you don’t have a full-fledged programming language to use and something declarative and severely constrained is necessary. But otherwise it’s just too many arbitrary bits of information for me to remember on top of what I already have to keep in my head to program in Java/GScript, so I avoid it when I do have a real language to work with.

Note that this plays into both how much syntactic help you should add into a language and what syntax there should be. If you have too many special syntactic elements, it might become difficult to remember them all. If they’re arbitrary, inconsistent, or otherwise unfamiliar to people, it’ll definitely be difficult to remember them.

Discoverability

Related to the idea of memorability is the idea of discoverability: how easy is it to figure out how to do something when you’re first starting out? The easier the learning curve, the more likely you are to try to learn something new, and the less time you waste doing it. As with memorability, the ability to just guess and have that work out makes it easier to discover the correct path. In addition, well-designed syntax will often play well with auto-complete in editors, making it easier to explore using an IDE and learn an API that way. Lots more goes into that as well, such as proper encapsulation and object relationships, but syntax also helps.

The Upshot

Again, I’m not trying to convince anyone right now that my particular views on language design are right; the important thing to me is that people agree that syntax and language design matter, and that it’s the right debate to have in the first place. How you go about doing things in a language (or in interacting with an API) really does matter, and the details really are important.

19 Comments on “Syntax Matters”

/v\atthew says:

April 29, 2008 at 11:14 am

Congrats, you guys made reddit.com: http://reddit.com/r/programming/info/6hhra/comments/

Reply
Thomas says:

April 29, 2008 at 1:10 pm

The Internet is full of posts like this. This kind of mental masturbation does nothing to solve real and occurring tasks, but it makes the author feel smarter than his peers.

Reply
Reg Braithwaite says:

April 29, 2008 at 2:50 pm

Great, thanks. I especially liked this insight: “Less code takes longer to write, but the real benefits are around maintenance: less code means less of a chance of bugs, less to keep in your head, less for someone else (or yourself 6 months later) to read through and learn, less to test, and less to modify when you change the rest of the system.”

I often hear teh argument that compact, expressive syntax is something that is easier to write but harder to read. I think you have expressed it well: it is harder, not easier, to make something succinct, but when you do so you make it easier to understand.

Reply
markus says:

April 29, 2008 at 3:16 pm

I absolutely agree 🙂

I cant recall any other blog the last 2 months I agree as much as in this one.

Now I hope that other folks realize this as well, and what is more important, that these languages wich a very clean syntax, are supported ALSO by the big companies out there (ruby and python already have a healthy and active community, but there could be more acknowledgement by older companies that still use PERL)

Reply
Traverse Davies says:

April 29, 2008 at 6:30 pm

Wow… I keep trying to say this stuff to Java devs I know (and people who try to make other languages into Java). Nice to have some backup that clearly illustrates what I am saying.

Reply
Alan Keefer says:

April 29, 2008 at 10:00 pm

@Thomas:

The point I was trying to make is not “we’re cool” or “language X is better than language Y” or “we know more about language design than you do” or anything similar. It’s hard to make the argument that syntax matters without giving some examples of how I think it matters, but I tried hard to emphasize that those opinions weren’t the point.

What I really wanted to express is that I’m frustrated when people insist that everything is always fine and that there’s nothing to debate about because language preference is relative (which I think it is) so there’s no point in arguing since everyone will have a different opinion. I’m just saying that syntax does actually matter, and it’s worth arguing over what sort of syntax is best so that, in the future, we can build even better languages or improve existing ones. Different people will have different opinions, and that’s totally healthy, but at least let’s try to articulate why we have those different opinions and what we think makes for a great language or API. So here I’m trying to lay out how I think good syntax can make a difference as a way to provide some sort criteria for syntax evaluation. Again, we can differ about whether closures really make code more readable than anonymous inner classes, but at least then we’ve got some parameters for our debate (i.e. we both care about readability) instead of just talking past each other and saying “whatever, anything goes.”

It’s not obvious from this post, but we actually are trying to do something to solve real problems: we’re building our own language on the JVM (currently called GScript) that started as an internal project for use in configuration scriptlets but which has ballooned into a full-fledged language that we’ll hopefully be able to release publicly at some point. Once it’s released, part of that will definitely be evangelizing what we think makes our language great, but part of the development process is also trying to find the best ways to do something. So we actually do want real feedback about what makes a language great.

Even if it never gets released, though, I think it’s useful to have well-reasoned debates over how languages should be designed as long as it doesn’t devolve into name-calling. In other words: if you’re trying to get work done, then the answer “I don’t care about syntax because I’m used to it” is perfectly valid. Use whatever tools get the job done for you. But it’s always important to keep asking “How can we make better tools?”, especially for those of us who are currently trying to make those tools, or for anyone else in a position to create new tools or make existing ones better.

Reply
nv1962 says:

April 29, 2008 at 11:57 pm

Don’t laugh about my oddball approach here, but I’m a linguist. Or so I tell myself. So, the statement “syntax matters” makes eminent sense to me. Which, as you indirectly and correctly point out, is not the same as stating “syntax is imperative.” In many instances, the objective of getting the work done prevails over otherwise perhaps interesting academic considerations.

It’s just as much nonsense to approach a general comparison of human languages in terms of superiority; yet a disregard of syntax leads to consequences, intended or not, that may impact the practicality of statements. Complex code with obtuse syntax carries such a penalty as well, not excluding economic cost.

Of course, in human language syntax is typically developed and applied in a descriptive fashion, to an existing language; in programming, it’s a prescriptive activity. But in both cases, the traction and attraction of a given syntactic system relies on the effectiveness of its didactic quality. The clearer its guiding principles are, the easier it will be to teach and effectively invite people to adopt them for practical use. In the end, it’s all about sex appeal…

I can’t state enough admiration for the work of those who develop coding platforms and systems, like you seem to do, along with Andrew Tannenbaum. He also insists on the importance of the didactic qualities of the resulting product.

(Bonus entertainment quiz: was that human language comparison an analogy or a simile?)

Reply
Top Posts « WordPress.com says:

April 29, 2008 at 11:58 pm

[…] Syntax Matters (Note: I had been working on this post before this thread showed up on Artima today, so I figured it was an appropriate […] […]

Reply
MySpace Codes says:

April 30, 2008 at 3:04 am

Very nice post, it helped me a lot with my coding. I’ll add it in the future. Thanks!!!

Reply
cozumelkid says:

April 30, 2008 at 3:36 am

This may not be right on the syntax subject, but it may be interesting. In the 1980’s I had a job, with the State of Oklahoma, where we deployed an AS400 to take manage 40,000 accounts. There were 9 secretaries under me. Some of the secretaries would input OKC, some input, O.K.C., some input Okla. City, some would input OK City, some input Oklahoma City, and some would mix up the Oo, Kk, Cc’s. Everything went fine until one day I had to find some one in Oklahoma City. A compliance letter went out. The next day that person was in my office trying to figure out why his license fee was not properly recorded. I saw the problem immediately, and called a meeting of the secretaries, invited the person in question so he would know we were on the job, the secretaries saw him as a real person, and we began to go back through the whole database the next day because the person who was hired to manage the AS400 didn’t know how to write a routine in RPG for the fix. Later I called in the experts, and from then on it didn’t make much difference what the input was because it was all standardized with Oklahoma City. By the way, that was the real name of the place, not OKC.

Reply
Daniel Bernier says:

April 30, 2008 at 12:03 pm

“…being able to do task X with 50 lines of code is preferable to needing 500 lines of code to do task X. Less code takes longer to write, but the real benefits are around maintenance: less code means less of a chance of bugs, less to keep in your head, less for someone else (or yourself 6 months later) to read through and learn, less to test, and less to modify when you change the rest of the system.”

Here’s the really nasty bit about this: if I write in 500 lines of Blub what you write in (say) 50 lines of Lisp, it’s

When we say things like “these 500 lines of java would be 50 lines of ruby, or 20 of lisp,” it sounds like it’s trivial to translate the other way, to turn 50 ruby lines into 500 java lines, like they’re equivalent. But of course, it’s not: explain a 50-line ruby snip to 10 blub programmers, and you’ll get 10 very different 500-line blub programs.

Each line of code, each token, is a decision point, something to consider, and that’s where less-expressive languages hurt us: they put more decisions in our way, they give us more chances to screw up.

Reply
Alan Keefer says:

April 30, 2008 at 5:51 pm

@nv1962:

I think that’s a great analogy, and I think you put it much more succinctly than I did in distinguishing between syntax mattering and syntax being imperative.

@Daniel:

That’s a great point about expressiveness. I was trying to make a point like that with the “map” example: in GScript, Ruby, or Python there’s one canonical way to do it, whereas a language like Java has too many options that, as you say, give you the ability to either screw it up or at least make it confusing to the reader. Likewise, I’m sure there are even more ways to code that in x86 assembler.

I also like the analyze this using the concept of “chunking” of short-term memory, whereby most can keep (I believe) 7 +/- 2 “chunks” in their head at one time, so the key to keeping more in your head is to have those chunks contain more information (whole words instead of individual letters, for example). Something like “map” is a pre-made chunk, whereas the Java for loop is about 3 or 4 different chunks, so it becomes harder to keep the whole program in your head. That’s naturally also an argument for decomposition, which is also key, but languages that encourage you to write in higher-level chunks will naturally lead to programs that are easier to get your head around.

You can also think of chunks in terms of things like chess openings; becoming a grand-master chess player is very much about having pre-built chunks to draw from stored in long-term memory. Higher-level languages give you those pre-built chunks that you can store in your mental backpack and pull out more easily when necessary (either when reading or writing), while less-expressive languages make that harder to do.

Reply
Stuart Halloway says:

May 1, 2008 at 3:31 am

NIcely done! I would like to nominate a fifth main reason: rich consistency (http://blog.thinkrelevance.com/2008/5/1/rich-consistency)

Reply
Robert Fischer says:

May 1, 2008 at 5:04 am

(Got here by way of Raganwald’s link feed.)

I’m kinda with Thomas: yeah, syntax is a big deal, and I’m not sure there are people out there who deny that. Maybe there are people out there who deny the importance of syntax and other language design considerations, but I don’t really see them. Basically, if you know more than one language, it’s hard to not recognize the differences between them: there’s a practical impact that’s pretty obvious pretty fast. As my co-blogger, Brian Hurt, puts it: “It’s not what a programming language make possible, it’s what a programming language make easy which determines what patterns are common and what patterns aren’t.”[1]

One really interesting thing did jump out with me from this post, though, and it apparently piqued Raganwald, too:
“Less code takes longer to write, but the real benefits are around maintenance: less code means less of a chance of bugs, less to keep in your head, less for someone else (or yourself 6 months later) to read through and learn, less to test, and less to modify when you change the rest of the system.”

Just s/less code/type checked code/g and this sounds exactly like my argumentation for static typing, particularly given the more compact syntax of implied static typing languages like Ocaml [2]. It’s really funny.

[1] http://enfranchisedmind.com/blog/2007/07/10/the-hole-in-the-middle-pattern
[2] http://enfranchisedmind.com/blog/2008/04/14/useful-things-about-static-typing/

Reply
Alan Keefer says:

May 1, 2008 at 6:22 am

@Robert:

Perhaps this isn’t true of the development community in general, but my experience with the Java community is that people don’t appreciate that syntax makes a difference: I’ve had numerous debates on various forums about why Java should have closures, for example, and the pushback always takes the form of “you can already do that in Java, so we don’t need closures.” So I was just trying to emphasize that saying “we can already do X” is not in itself a good reason for rejecting better ways to do X. Developers in general are often far too defensive about their tools, languages, and techniques, and there’s a large lost opportunity for improvement as a result.

It is interesting that the arguments start to sound the same; everyone’s trying to get to the same place as far as more expressive, less-verbose, harder to screw up, more maintainable code, people just disagree on how exactly to get there. And inevitably people gravitate towards the tools they’re most familiar with (including me), which makes it difficult to truly be objective about what tradeoffs we’re really making when we choose static or dynamic typing. GScript’s type system pushes it much closer to the Java camp than to the Ocaml camp, which at least is an improvement thanks to type inference and lets us do certain kinds of metaprogramming, but the Ocaml type system really seems like more of the “right thing.” I appreciate the effort to point out to people that static typing doesn’t have to suck, though; far too often people assume “static typing” implies something in the C/Java mold along with all their limitations.

Reply
links for 2008-05-01 -- Chip’s Quips says:

May 1, 2008 at 8:39 am

[…] Syntax Matters « Development at Guidewire If it didn’t, we’d still all be using assembly language. Thanks, Reg. (tags: programming syntax complexity coding readability) […]

Reply
Brian Hurt says:

May 2, 2008 at 12:02 am

Perhaps this isn’t true of the development community in general, but my experience with the Java community is that people don’t appreciate that syntax makes a difference: I’ve had numerous debates on various forums about why Java should have closures, for example, and the pushback always takes the form of “you can already do that in Java, so we don’t need closures.”

To what extent do the people offering pushback actually know more than one language? And “I programmed in X for two weeks back in college” doesn’t count it. Nor does “I know both Java and C#!” (which reminds me of the joke from the Blues Brothers- “We play both kinds of music here- country and western!”).

Java and C++ are especially bad with this- with the rise of schools that taught Java or C++ as first programming languages, it’s now possible for someone to have gotten a degree in CS, and have 5-10 years of professional development experience, and never learned a second language.

At which point, Robert’s comment really doesn’t hold. If you only have one point of view, you have no depth perception- and if you only know one language, you have no ability to judge the advantage or disadvantage of adding a given feature to the lanuage (or the advantage or disadvantage of other languages).

Reply
Why “Less Code” Matters « Invisible Blocks says:

June 23, 2008 at 7:22 pm

[…] – Alan Keefer, Syntax Matters […]

Reply
Kamy Lamm says:

August 22, 2008 at 3:50 am

good discussion on java, and the people who have given comments really sounds good. So many good things has shared by the experience people……….. really good work………

Reply