Magnus Skjegstad

Just-in-Time Summoning of Unikernels (v0.2)

Jitsu - or Just-in-Time Summoning of Unikernels - is a prototype DNS server that can boot virtual machines on demand. When Jitsu receives a DNS query, a virtual machine is booted automatically before the query response is sent back to the client. If the virtual machine is a unikernel, it can boot in milliseconds and be available as soon as the client receives the response. To the client it will look like it was on the whole time.

Jitsu can be used to run microservices that only exist after they have been resolved in DNS - and perhaps in the future can facilitate demand-driven clouds or extreme scaling with a unikernel per URL. Jitsu has also been used to boot unikernels in milliseconds on ARM devices.

A new version of Jitsu was just released and I'll summarize some of the old and new features here. This is the first version that supports both MirageOS and Rumprun unikernels and uses the distributed Irmin database to store state. A full list of changes is available here.

A Jitsu demo hosting a MirageOS unikernel runs here and nginx on Rumprun here. These demos should be up most of the time, but may occasionally be unstable or unavailable as they are also used to test new features.

For more technical details and up to date example configurations with MirageOS and Rumprun, see the Jitsu README.

Overview

How it works

The following figure shows what happens when Jitsu receives a DNS query for a unikernel that is not currently running.

Jitsu

First, the client sends a DNS request to Jitsu. The request is received and checked against a list of domains mapped to unikernels. If there is a match, Jitsu boots the corresponding unikernel VM. When the unikernel has started booting, Jitsu sends a DNS reply to the client containing the future IP address of the unikernel. When the DNS response is received by the client it initiates a TCP connection to the IP address in the DNS response.

Each DNS reply from Jitsu contains a time-to-live (TTL) value that tells the client for how long the reply is valid. This is a feature built into DNS that allows query results to be cached by the client or by other name servers. When the TTL expires, the client will have to send a new DNS query to Jitsu to verify that a cached response still correct. By using low TTL values (typically less than an hour), Jitsu can keep track of how often a unikernel is used and may automatically stop unikernels that have not been requested within a certain time period. By default, unikernels are stopped after 2 x TTL timeout.

Masking boot delays

When Jitsu boots a unikernel and returns the DNS response there is a race between the client and the unikernel. The unikernel has to be able to respond to a TCP request within the time it takes to send the DNS reponse back to the client and for the client to attempt to connect. What happens if the unikernel is unable to complete its boot process in time?

The problem is illustrated in this figure:

Jitsu without Synjitsu

After being booted by Jitsu, the unikernel has about 1 round-trip-time (RTT) to finish booting. This is the time it takes for the DNS query to go back to the client and for the TCP handshake to be initiated with a SYN packet. For example, the typical boot time of a MirageOS unikernel on ARM is about 3-400 ms. It is then likely that the unikernel will not be ready in time and that the first SYN packet is lost. TCP is able to recover by retransmitting the SYN packet, but it requires a timeout and a retransmit. With recommended timeout values it will take a second or longer for the client to finally connect.

To mask this delay and avoid the SYN retransmission, Jitsu now supports three alternative mechanisms:

  • Delay the DNS response by a fixed timeout (e.g. 150-200 ms) to let the unikernel complete the boot process
  • Wait for the unikernel to signal Jitsu before sending the DNS response
  • Cache the incoming SYN packet on behalf of the unikernel with Synjitsu

The simplest solution to set up is the fixed delay. Depending on the application, a delay of a few hundred milliseconds may be acceptable to the client. For a web application for example, this may not be noticeable. This mechanism does not require modifications to the unikernel itself. The downside is that the delay is fixed, so even if the unikernel starts faster than expected there will still be a delay for the client.

A more dynamic approach is to let the unikernel notify Jitsu when it is ready. This is currently done by waiting for a key to appear in Xenstore, Xen's shared information store. To write the key the unikernel only needs a working Xenstore client implementation. Jitsu will watch Xenstore and immediately send the DNS response when the key appears. While more dynamic, this mechanism doesn't allow Jitsu to send the DNS reply while the unikernel is booting - making the delay longer than necessary.

To be able to use both a dynamic delay and send the response back while the unikernel is booting, Jitsu has support for running a separate unikernel service that caches incoming SYNs until the unikernel is ready. We call this Synjitsu.

Synjitsu

Synjitsu is a unikernel service that handles TCP connections on behalf of unikernels that are completing their boot process. Synjitsu is always running and captures TCP SYN packets that appear on the network bridge that don't have a matching unikernel yet. The SYNs are then stored in the Xenstore database. When a new unikernel has booted it will check for cached SYNs that matches its MAC- and IP-address in Xenstore. Every SYN it finds will then be processed as if it was received over the network and trigger a SYN/ACK to complete the TCP three way handshake. When the unikernel has finished booting all incoming SYNs are ignored by Synjitsu and go directly to the unikernel as regular network traffic.

The process is shown above. A DNS query has already been sent to Jitsu DNS, a unikernel has been booted and a reply sent back to the client. The client now attempts to send a TCP SYN packet to initiate the TCP connection. Unfortunately, the unikernel is not ready yet and would be unable to reply. Synjitsu then silently stores the SYN for the remaining milliseconds while the unikernel completes its boot process. When the unikernel is ready it will retrieve the SYN and send a SYN/ACK back to the client.

For Synjitsu to work properly we also have to handle ARP traffic. ARP is used to find the MAC address that matches an IP address on the local network. Before the incoming TCP SYN can reach its destination, the router (usually the local gateway) has to know the MAC address it should be sent to. As the unikernel is still booting it is unable to announce its address and IP - in fact the IP is not really in use yet. To compensate for this, Jitsu will tell Synjitsu the MAC- and IP-address of every unikernel that is currently booting. Synjitsu then sends gratuitous ARP packets to announce to the network that it is handling the specified MAC and IP for now. As soon as the real unikernel finishes booting it sends its own gratuitous ARP packet to notify the network that it is ready.

Synjitsu is currently a highly experimental feature and requires a modified MirageOS TCP/IP stack. For more information about running Synjitsu, see our paper or ask on the mailing list. The source code is available here.

Irmin and Jitsu

Irmin is a distributed database with git-like features, such as a full history of changes and support for branching and merging. Jitsu's internal state is now stored in an Irmin database which can be inspected using the Irmin tool. The database used to store the state of the demonstration is shown below.

$ irmin tree
/jitsu/vm/1ca...b60/config/disk/0........................."/dev/loop5@xvda"
/jitsu/vm/1ca...b60/config/disk/1........................."/dev/loop6@xvdb"
/jitsu/vm/1ca...b60/config/dns/0....................."www.rump.jitsu.v0.no"
/jitsu/vm/1ca...b60/config/ip/0............................."89.16.190.215"
/jitsu/vm/1ca...b60/config/kernel/0............................."nginx.bin"
/jitsu/vm/1ca...b60/config/memory/0................................."64000"
/jitsu/vm/1ca...b60/config/name/0................................."rump-xl"
/jitsu/vm/1ca...b60/config/nic/0......................................"br0"
/jitsu/vm/1ca...b60/config/response_delay/0..........................."0.9"
/jitsu/vm/1ca...b60/config/rumprun_config/0......................"json.cfg"
/jitsu/vm/1ca...b60/dns/www.rump.jitsu.v0.no/ttl......................."60"
/jitsu/vm/1ca...b60/ip......................................"89.16.190.215"
/jitsu/vm/1ca...b60/response_delay...................................."0.1"
/jitsu/vm/1ca...b60/stop_mode....................................."destroy"
/jitsu/vm/1ca...b60/use_synjitsu...................................."false"
/jitsu/vm/de4...ad3/config/dns/0.........................."www.jitsu.v0.no"
/jitsu/vm/de4...ad3/config/ip/0............................."89.16.190.214"
/jitsu/vm/de4...ad3/config/kernel/0..........................."mir-www.xen"
/jitsu/vm/de4...ad3/config/memory/0................................."64000"
/jitsu/vm/de4...ad3/config/name/0.................................."www-xl"
/jitsu/vm/de4...ad3/config/nic/0......................................"br0"
/jitsu/vm/de4...ad3/config/response_delay/0..........................."0.1"
/jitsu/vm/de4...ad3/config/wait_for_key/0....................."data/status"
/jitsu/vm/de4...ad3/dns/www.jitsu.v0.no/ttl............................"60"
/jitsu/vm/de4...ad3/ip......................................"89.16.190.214"
/jitsu/vm/de4...ad3/response_delay...................................."0.1"
/jitsu/vm/de4...ad3/stop_mode....................................."destroy"
/jitsu/vm/de4...ad3/use_synjitsu...................................."false"
/jitsu/vm/de4...ad3/wait_for_key.............................."data/status"

The Jitsu database is currently read only, but in the future the plan is to allow clients to create their own branch of the database, perform changes and then merge with Jitsu's master branch. This can then be used to control Jitsu while it is running and may, for example, allow unikernels to modify their own boot- and DNS configuration. The Irmin database will also make it easier to split Jitsu into smaller components that cooperate and allow some features to run within separate unikernels (e.g. DNS).

New backends

Jitsu v0.2 includes support for several backends that can be used to manage the unikernel VMs. The original libvirt backend is still used by default, but libxl and XAPI are also supported (but not as well tested). If you encounter problems with the new backends, please report them here.

Tell me more!

This post has mainly focused on Jitsu, but if you are interested in unikernels and want more information about writing your own or hosting your web site with one, these links may be useful:

There are also many MirageOS application examples available here. The examples are kept up to date with the latest libraries.

If you experiment with Jitsu, please let us know how it went on the mailing list!

(Thanks to Daniel Bünzli and Amir Chaudhry for comments on previous versions of this post)