My bad opinions

2013/02/08

REPL? A bit more (and less) than that

The Erlang shell is a funny thing. I think a lot of people who used the language for a short while quickly got annoyed by the lack of support for features that are often considered very basic, such as history or history search (now supported since R16A), or lack of full support for Emacs shortcuts, or the fact that it doesn't use readline, but only emulates it (wrapping Erlang's shell in rlwrap is often recommended). Users of more advanced REPLs such as the one provided with Factor, or Dr. Racket, are likely disappointed with the visual support that's available in Erlang. Not being able to declare inline modules is a bit annoying as modules are only accepted in files, not in the shell.

In this post, I want to explain how the Erlang shell works, why such features can be somewhat difficult or easy to add in, and also showcase some of the really neat features it has that few other shells provide.

It isn't a REPL

The first thing to know is that Erlang's shell is not a Read-Eval-Print Loop. At least, not in the original lisp definition of (loop (print (eval (read)))).

For starters, Erlang has no such thing as a main loop. It has processes, each of which can have a main loop, but there is no easily traceable central process that leads the march. Because of this, the language was designed with support for a feature called 'groups' and 'group leaders'. We'll get into more details later, but they allow multiple shells and bits of output to be redirected to all kinds of place, such as other nodes, the current standard input/output, and so on. They're inherited from parent process to child process (unless you change them).

When the Erlang VM starts, it starts with the kernel application and the stdlib application. There's some magic down in the innards of the VM that allows the two modules to depend on each other to exist and boot. The kernel app will start a bunch of supervisor processes that control every 'service' the VM runs -- distribution, for example.

One of these supervisors is named user_sup. This one process is in charge of detecting arguments passed to the VM that are relative to the shell. It will figure if the VM is a master or a slave, if there is a shell at all, if input and output are disabled or not, and what kind of shell you want to load (the default one, or the old one that few people use).

This kind of switch is useful because the shell is just a regular Erlang process, and this means that the node might be a distributed one that should redirect all its output to a master node, or that you can have many shells for one virtual machine. If the Erlang instance that just started does not need to forward its output to a master node, it will either redirect all its output to the standard input/output (under the -oldshell mode or if input is disabled), or it will redirect it to a port program through a middle-man process named user.

As I said a few paragraphs earlier, groups and group leaders allow redirecting output. This is where the clever things happen. The group leader can be a process with a given name. By setting the default group leader to user, you can get the entire VM to dynamically send its input and take its output from wherever it needs to, without any other process ever needing to care about it. If you want to change where the IO takes place, change the user process, and everything gets redirected. The user_sup thus follows this decision tree:

decision tree explaining type of 'user' to register

The first option is trivial to do. The second one is somewhat simple, and is implemented in the infamous user module (infamous because every newcomer tries to have a module named that and ends up killing a lot of stuff). The current, default Erlang shell is the third solution here, and it is by far the most complex one.

So now we've picked how to deal with where to send the IO stuff, but we still haven't seen how to actually get a shell running. The closest thing Erlang would have to a REPL would be the 'old shell', accessible by using erl -oldshell. It's the one that goes straight to the user module. Given it's the simplest one, it will also be a decent starting point to understand how things work.

The user module basically handles both the 'read' and 'print' parts of a REPL. It accumulates characters, displays prompts, collects lines, and shoves it to the 'eval' part to fetch it back and display it:

dependency diagram between user.erl and shell.erl

As the drawing shows, shell.erl is in charge of the evaluation. It takes a string, tokenizes it, gets it ready for evaluation, and then spawns a temporary process just for evaluation purposes. The reason for this is that the process running the shell evaluator itself is also in charge of some tricky state you may want to stick around across calls that fail. Some examples are record definitions loaded from files (records are a compiler trick, so they're held in memory and inject themselves while evaluating terms), evaluation history (the shell holds a list of previous abstract calls and their abstract result so they can be re-called later), and variable bindings.

When the expression has been evaluated, the result is forwarded to user.erl, which checks the terminal geometry and whatever to make sure it's fine, and shoves the results there through standard output.

This can work somewhat alright, and you could possibly push it far enough to work like a pretty good shell. The problem is that a process based on a module like user.erl leaves you little to do but to reimplement large parts of it if you ever want to re-use the same kind of code for other things than straight up standard io. What if you want to allow connecting the shell directly over SSH (which is totally possible), over a given C program, a GUI, or wherever else? Then you'd need to deal with things at a rather low level and move streams of text around and hope for the best.

The Erlang guys decided to go a different way and use a few more levels of abstraction. They decided to write a new shell. Instead of starting user.erl, the supervisor now starts a process named user_drv, based on the user_drv.erl module. The new structure now looks like this:

dependency diagram between tty, user_drv, group.erl (many), edlin.erl, shell.erl (many)

Under this structure, user.erl has its functionality split and shared against 4 components. The tty plays the role of the former stdio stuff, and can be adapted to whatever OS it's running on at compile-time. That is where the text you type in the shell enters the VM. It is forwarded to user_drv, which then decides what to do with it. If the text happens to be ^C or ^G, user_drv drops into shell management mode.

To understand what this mode does, we need to know that there can now be many concurrent shells running, and they're regrouped under a list of group.erl processes managed by user_drv. This list determines what shell is currently the active one. Shell management will allow you to do various operations on shells themselves. Here's what you see after pressing ^G:

    User switch command
     --> h
      c [nn]            - connect to job
      i [nn]            - interrupt job
      k [nn]            - kill job
      j                 - list all jobs
      s [shell]         - start local shell
      r [node [shell]]  - start remote shell
      q        - quit erlang
      ? | h             - this message
     --> s
     --> j
       1  {shell,start,[init]}
       2* {shell,start,[]}
     --> c 1

Shells can be individually killed, interrupted (good for infinite loops, waits, deadlocks), and so on. By connecting to a given shell instance, you get back into regular editing mode.

Now, if the text you type is regular text, user_drv looks for the currently active group.erl process, the current shell. That's the one it can send data to, with message passing.

So what does a group.erl process do, exactly? Its prime task is to buffer up the data into lines ready to be evaluated, and handle a line stack so that you can move through them with the up/down arrows and search the line history. Until the lines are completed, it's going to do a back-and-forth with edlin.erl, which will handle cursor movement, character deletion, escaping, and do some heavy tag-team to deal with history. They're doing high-level line editing and management. Its second task is to act as a group leader to the shell.erl it started.

When a line is seen as valid and that the shell.erl instance attached to the current group.erl process is ready, the line is sent there to be evaluated. The result is then passed back to the group.erl process, which moves it to user_drv.

From the user driver's perspective, there can be many shells' data being forwarded at once. When receiving output, user_drv checks if it's from the current group.erl instance. If it is, the data is formatted and forwarded to tty. If it isn't, it's muted. However, there's a special case for data that is sent directly to user: that one has a free pass and is never filtered, allowing regular IO to reach the user even when switching active shell instances.

This generally gives sane filtering. Because the group process set itself as the shell process' group leader, and that group leaders are inherited, all processes started directly within an active shell instance will have its output limited to that one, while still being able to access the output of applications started outside of a terminal, which still send data to user. It seems to be good enough for most people to never know this filtering takes place, or to need to know how it works.

This is where it gets better

A particularly interesting feature here is the ability to start remote shells. This is made possible, in part, due to how the group feature works. You can, if you want, tell the group to start a shell over the RPC mechanisms of the language, giving you something a bit like this:

visual description of shell.erl being on one node while group.erl is on another one

Under this scheme, regular message passing is enough to get things working; the evaluation takes place in a remote context, with editing being done locally. This is different from a slave node: For a slave node, all output is redirected to the master. For a remote shell of this kind, only the output that comes from that particular RPC shell (and its children) is redirected.

Another interesting thing the Erlang guys did was implement a message-passing protocol between group.erl and user_drv.erl, which the latter then translates for the tty part of the stack. The protocol looks kind of similar to the regular io protocol, but with added support for line-editing messages such as moving a cursor, blinking (visual bell), erasing lines, and so on.

It sounds a bit overkill for the shell, but there's a very nice aspect to it; the code can be reused by any Erlang application. For example, the ssh daemon I linked above uses groups with a custom evaluator for the terminal. Then, the setup looks like this:

visual description a standalone group being connected to by an ssh client

By substituting itself to user_drv, the SSH daemon can take the place of tty for the input and output part, while group.erl and edlin.erl keep line editing working.

Anybody could then implement their own shell, even through a web browser if they felt like it. The only thing they'd need to do is translate the line editing protocol and implement it on the client side. They'd also need to think of security because trying to sandbox an Erlang node is non-trivial. Attempts have been made (using something called a 'safe shell') by substituting the shell.erl processes by another implementation, something doable given there is no direct dependency between group.erl and whatever evaluator outside of the message-passing protocol they speak.

Why isn't it made better?

That's a good question. The easy hypothesis I have is that very few people actually know how the shell works. Developers likely expect a regular REPL and end up finding a distributed task management system that can be called in a dozen ways. This is unexpected and very confusing when you first get in.

Features like better emacs shortcut support are likely easy to add in edlin.erl. Adding vi mode support would probably need a new line editor entirely. Support for some shortcuts may need to be implemented in 'drivers' (tty, for example). Saving shell history needs to take into account where groups will actually run, and how many of them there can be on one node trying to access the same files.

I hope this post allowed interested readers to get a general idea of how things work in the Erlang shell. It's an interesting piece of code, both powerful in what it can (and could) do, even though it is underwhelming on other levels. I've been having fun trying to add features here and there in the Erlang shell, and getting more people to discuss it with could be nice!