We talk about programming like it is about writing code, but the
code ends up being less important than the architecture, and the
architecture ends up being less important than social issues.
The Why LSP post discusses the “social issues” solved by LSP.
LSP (as a part of overarching Microsoft strategy) is brilliant,
because it moved the world to a new equilibrium where not having basic
IDE support is frowned upon. This post instead discusses architectural
aspects of LSP, which I personally find not as brilliant(especially
given that
Dart Analysis Protocol
predates LSP and is technically superior in some aspects). Perhaps it
could be useful for someone designing other LSP-shaped protocols! Note
that it’s been couple of years since I was actively involved in LSP,
probably the grass is greener these days!
Let’s get to the list of properties, good and bad, in no particular
order.
And let’s start with an aspect of the architecture which is genius,
and which, I think, is responsible for a big share of LSP success on
the technical side. If you build a tool for working with multiple programming languages, one of the biggest questions
is how to find common ground among different, but ultimately
similar, languages. A first attempt is to uncover essential
commonality: after all, all languages have files, variables,
functions, classes, right? This is … maybe not necessary a dead end,
but definitely a thorny and treacherous path — languages are
different, each language is weird in at least some of its aspects,
and common ground risks to level away meaningful distinctions.
So, what does LSP do here? It just doesn’t provide a semantic model
of the code base. Instead, it is focused squarely on the
presentation. No matter how different each programming language is,
they all, in the end, use the same completion widget. So LSP is
formulated in terms of what’s shown in the completion widget, not in
terms of the underlying semantic language entities. That means that
each language has an internal semantic model which is full fidelity
for this particular language, and uses it to provide the
best completion experience which is possible for a given completion
widget. This is how rust-analyzer is structured internally as well:
Compiler layer deals with the messy language analysis tasks, it
derives more structured information (types) from less structured
information (source text), explicitly tracking analysis layers and
phases.
The HIR (high-level intermediate representation) is a façade
around the compiler, which provides a rich graph-based object
model of code which looks as if all derived information, like
types, is pre-computed.
The IDE layer uses HIR to compute things like completions, and
presents them as Rust-specific, but semantics-less POD structures
to be shown to the user in GUI more or less as is.
One consequence of this architecture is that LSP requests map to
editor widgets, and not to the underlying language concepts, even
when several different widgets are powered by the same underlying
data. For example, LSP has separate requests for:
hierarchical outline of a file displayed in the side bar,
“breadcrumbs” shown in the header,
syntax-aware selection ranges,
code folding.
Although all four features are just different views into an AST,
there’s no “get AST” request in the LSP. Different requests allow to
fine-tune presentation for the different use-cases, and the details
do differ! Semantic selection might contain some sub-syntax ranges
inside string literals and comments, breadcrumb need to include
things like conditionals of if expressions, while the
outline might want to get rid of less important nodes. Attentive
reader will notice that breadcrumbs and the outline actually use the
same LSP request. Even LSP doesn’t follow LSP philosophy fully!
After a big thing that LSP did right, let’s look at a small thing
that it got wrong. Let’s look at how information is transmitted over
the wire.
JSON is actually OK! Many people complain that JSON is slow, but
that’s not actually the case generally. There are some edge cases,
where particular client libraries can be slow as was the case at
least at some point with Swift and Emacs, but JSON is definitely
fast enough for Rust, Java and JavaScript. Of course, something
substantially better than JSON is possible in theory.
I think ideally we need “WebAssembly for IPC”, a format that:
has dual text and binary encoding,
is stupidly simple,
is thoroughly, readably, and precisely specified,
and, in general, is principled and a joy to use.
There’s no such format yet, so JSON it is. Good enough.
HTTP framing is not OK. On the wire, the messages framed like this:
Content-Length: 92 \r\n\r\nActual message
That is:
case-insensitive “content-length” header,
followed by length of the following message, formatted as a
decimal number in ASCII,
followed by double \r\n,
followed by the actual message.
This resembles HTTP, but is not actual HTTP, so you need to write a
bit of custom code to deal with the framing. That’s not hard:
Prone to complexity amplification, invites jsonrpc framework with
all the latest patterns.
"jsonrpc": "2.0" is meaningless noise which you have
to look at during debugging.
Error codes like -32601 (ah, that comes from
xml-rpc!).
Includes notifications. Notification are a big anti-pattern in
RPC, for a somewhat subtle reason. More on this later.
What to do instead? Do what Dart does, some excerpts from the specification:
Messages are delineated by newlines. This means, in particular,
that the JSON encoding process must not introduce newlines
within a message. Note however that newlines are used in this
document for readability.
To ease interoperability with Lisp-based clients (which may not
be able to easily distinguish between empty lists, empty maps,
and null), client-to-server communication is allowed to replace
any instance of “{}” or “[]” with
null. The server will always properly represent empty lists as
“[]” and empty maps as “{}”.
Clients can make a request of the server and the server will
provide a response for each request that it receives. While many of the requests that can be made by a client are
informational in nature, we have chosen to always return a
response so that clients can know whether the request was
received and was correct.
LSP uses (line, column) pairs for coordinates. The neat
thing here is that this solves significant chunk of \n
vs \r\n problems — client and server may represent line
endings differently, but this doesn’t matter, because coordinates
are the same.
Focus on the presentation provides another motivation, because
location information received by the client can be directly
presented to the user, without the need to parse the underlying
file. I have mixed feelings about this.
The problem, column is counted using UTF-16 code units.
This is, like, “no”. For many reasons, but in particular, UTF-16 is
definitely the wrong number to show to the user as a “column”.
There’s no entirely obvious answer what should be used instead. My
personal favorite would be counting utf-8 code units (so, just
bytes). You need some coordinate space. Any reasonable
coordinate space won’t be useful for presentation, so you might as
well use the space that matches the underlying utf-8 encoding, so
that accessing substrings is O(1).
Using unicode codepoints would perhaps be the most agreeable
solution. Codepoints are useless — you’ll need to convert to
grapheme clusters for presentation, and to utf-8 code units to do
anything with the string. Still, codepoints are a common
denominator, they are more often correct if incorrectly used for
presentation, and they have a nice property that any index less than
length is valid irrespective of the actual string.
As mentioned above, one drawback of one-way notifications from
jsonrpc is that they don’t allow signaling errors. But there’s a
more subtle problem here: because you don’t receive response to a
notification, it might be hard to order it relative to other events.
The Dart protocol is pretty strict about the ordering of events:
There is no guarantee concerning the order in which responses
will be returned, but there is a guarantee that the server will
process requests in the order in which they are sent as long as
the transport mechanism also makes this guarantee.
This guarantee ensures that the client and the server mutually
understand each other’s state. For every request the client knows
which file modifications happened before it, and which came
afterwards.
In LSP, when the client wants to modify the state of a file on the
server, it sends a notification. LSP also supports server-initiated
edits. Now, if the client sends a didChangeTextDocument
notification, and then receives a workspace/applyEdit
request from the server, there’s no way for the client to know
whether the edit takes the latest change into the account or not.
Were
didChangeTextDocument a request instead, the client
could have looked at the relative order of the corresponding
response and workspace/applyEdit.
LSP papers over this fundamental loss of causality by including
numeric versions of the documents with every edit, but this is a
best effort solution. Edits might be invalidated by changes to
unrelated documents. For example, for a rename refactor, if a new
usage was introduced in a new file after the refactor was computed,
version numbers of the changed files would wrongly tell you that the
edit is still correct, while it will miss this new usage.
Practically, this is a small problem — it works most of the time (I
think I have seen zero actual bugs caused by causality
loss), and even the proper solution can’t order events originating
from the client relative to the events originating from the file
system. But the fix is also very simple — just don’t voluntarily
lose causality links!
And this touches what I think is the biggest architectural issue
with LSP. LSP is an RPC protocol — it is formed by “edge triggered”
requests that make something happen on the other side. But this is
not how most of IDE features work. What actually is needed is “level
triggered” state synchronization. The client and
the server need to agree what something is, deciding the
course of action is secondary. It is “to be or not to be” rather
than “what is to be done”.
At the bottom is synchronization of text documents — the server and
the client need to agree which files there are, and what is their
content.
Above is synchronization of derived data. For example, there’s a set
of errors in the project. This set changes when the underlying text
files change. Errors change with some lag, as it takes time to
compute them (and sometimes files changes faster than the errors
could be re-computed).
Things like file outline, syntax highlighting, cross-reference
information, e.t.c, all follow the same pattern.
Crucially, predicting which changes to the source invalidate which
derived data requires language specific knowledge. Changing the text
of foo.rs might affect syntax highlighting in bar.rs (as syntax highlighting is affected by types).
In LSP, highlighting and such are requests. This means that either
the client is incorrect and shows stale highlighting results, or it
conservatively re-queries all highlighting results after every
change, wasting the CPU, and still showing stale results
sometimes, when an update happens outside of the client (eg, when
cargo finished downloading external crates).
The Dart model is more flexible, performant and elegant. Instead of
highlighting being a request, it is a subscription. The
client subscribes to syntax highlighting of particular files, the
server notifies the client whenever highlights for the selected
files change. That is, two pieces of state are synchronized between
the client and the server:
The set of files the client is subscribed to
The actual state of syntax highlighting for these files.
The former is synchronized by sending the whole “current set” of
files in a request, whenever the set changes. The latter is
synchronized by sending incremental updates.
Subscriptions are granular both in terms of the file set, as well as
in terms of features. The client might subscribe for errors in the
whole project, and for highlights in the currently opened documents
only.
Subscriptions are implemented in terms of RPC, but they are an
overarching organizational pattern followed by the majority of the
requests. LSP doesn’t have an equivalent, and has real bugs with
outdated information shown to the user.
I don’t think Dart goes as far as possible here. JetBrains Rider, if
I understand correctly, does something smarter:
I think the idea behind the rider protocol is that you directly
define the state you want to synchronize between the client and the
server as state. The protocol then manages “magic” synchronization
of the state by sending minimal diffs.
In this refactoring, the user selects a function declaration, then
rearranges parameters in some way (reorders, removes, adds, renames,
changes types, whatever), and then the IDE fixes all call-sites.
The thing that makes this refactor complex is that it is interactive
— it’s not an atomic request “rename foo to bar”, it’s a dialog between the IDE and the user. There are
many parameters that the user tweaks based on the analysis of the
original code and the already specified aspects of the refactoring.
LSP doesn’t support this workflows. Dart somewhat supports them,
though each refactoring gets to use custom messages (that is,
there’s quite good overall protocol for multistep refactorings, but
each refactoring essentially sends any over the wire,
and the IDE on the other side hard-codes specific GUIs for specific
refactorings). This per-refactoring work is not nice, but it is much
better than not having these complex refactorings at all.
A small one to conclude. Significant chunk of conceptual LSP
complexity comes from support for dynamic registration of
capabilities. I don’t understand why that features is there,
rust-analyzer uses dynamic registration only for specifying which
files should be watched. And that would be much simpler if it used a
plain request (or a subscription mechanism).