Error handling of in-vehicle networks makes them vulnerable

Error handling of in-vehicle networks makes them vulnerable Cho & Shin, CCS 2016

In a previous edition of The Morning Paper we looked at how many production errors can be tracked back to error / exception handling. But today’s paper is something special. It studies the properties of the Control Area Network (CAN) protocol used in cars and finds a very potent attack which is much easier to deploy than previously reported methods. The attack, called a bus-off attack exploits the very mechanisms built into the CAN bus to make it reliable in the presence of errors. It has a number of particularly deadly properties:

  • Unlike previous attacks, it does not rely on carefully reverse engineering the messages sent on the CAN bus for a particular make/model of vehicle. It will work with any vehicle regardless of manufacturer or model, with no prior knowledge required.
  • It defeats the MAC-based protection that thwarts most previously known attacks, since it does not require message delivery.
  • It deceives state-of-the-art IDS systems, since the attack traffic is very hard to distinguish from genuine system errors caused by a bit flip etc.
  • It is independent of any implementation subtleties of particular ECUs

… not only contemporary insecure in-vehicle networks, but also prospective security-enhanced ones will still be vulnerable to the bus-off attack.

These facts combined with its ‘ease-of-use’ make for a dire warning in the paper’s conclusions:

Even though the proposed attack has not yet been seen in the wild, it is easy to mount and also directly related to drivers/passengers’ safety, and should thus be countered with high priority. Moreover, the facts that the proposed attack can nullify state-of-the-art solutions and is easy to launch, make it even more important to design and deploy its countermeasures. Thus, we recommend concerted efforts from both academia and industry to account for this vulnerability in the design of in-vehicle networks.

Small comfort perhaps, that the attack requires a compromised in-vehicle ECU to start with. Compromising an ECU is pretty much treated as a solved problem in this paper, with plenty of references given for how to do it. Once an ECU (e.g. the telematics unit) has been compromised, all the bus-off attack requires is the basic abilities to sniff messages on CAN (it’s a broadcast bus, so that’s a given), and to be able to inject any message with a forged ID and DLC (Data Length Code). “These are basic capabilities of an adversary who has control of a compromised ECU.”

Let’s take a quick look at the essential features of CAN needed to understand the attack, and then we get into bus_off itself…

CAN background

CAN interconnects ECUs through a message broadcast bus. CAN frames sent on the bus are of one of four types: data frames for sending data, remote frames for requesting transmission of data, error frames used to indicate detected errors, and overload frames to inject delay between frames.

A CAN data frame looks like this:

The length of the frame is carried in the 4-bit Data Length Code (DLC), and in most cars a 1-byte checksum is contained in the last byte of the data field.  Every message published on the bus contains a unique ID representing its priority and meaning. Being a broadcast bus, there is no notion of sending to a particular recipient, instead ECUs just look for messages with IDs of interest to them.

Once the CAN bus is detected idle, a node with data to transmit, starts its frame transmission (Tx) by issuing a Start-of-Frame (SOF). SOF provides hard synchronization between ECUs to make bitwise transmission and reception feasible. At that time, one or more other nodes may also have buffered data to transmit, and may thus concurrently access the bus. In such a case, the CAN protocol resolves the access contention via arbitration.

Zeros beat ones

Each node sends its frame one bit at a time and monitors the actual output on the CAN bus.  If a node sees a 0-bit even though it has transmitted a 1-bit then it is considered to have lost arbitration. Such a node withdraws from bus contention and switches to receiver mode. When the winner of arbitration has completed sending its message, a three-bit IFS (Inter-Frame Separator) delay follows, after which the bus is free again for access.

Error handling

CAN systems are critical to vehicle security and are designed to be robust in the face of errors. There are five defined error-detection mechanisms in the protocol: bit errors; stuff errors (after every five consecutive bits of the same polarity, an opposite polarity bit must be ‘stuffed’ for soft synchronization – failure to do this is a stuff error); CRC errors; Form errors (when the fixed-form bits e.g delimiters etc. contain at least one illegal bit), and ACK errors.

When an error is detected, the node transmits an error frame on the bus and increases either its transmit error counter (TEC), or receive error counter (REC). Crucially, a node detecting an error during transmission increases its TEC by 8, whereas a node detecting an error when receiving only increases its REC by 1.

To confine serious errors, each ECU moves between three states as shown above: error active, error passive, and bus off.  An ECU starts in error active mode. If either of its TEC or REC counters go above 127, it moves into the error passive mode. An error-passive node returns to error active once both of its TEC and REC counters fall below 128.

When TEC exceeds the limit of 255, the corresponding ECU – which must have triggered many transmit errors – enters the bus-off mode. Upon entering this mode, to protect the CAN bus from continually being distracted, the error-causing ECU is forced to shut down and not participate in sending/receiving data on the CAN bus at all. It can be restored back to its original error active mode either automatically or manually. However, since bus-off is usually an indication of serious network errors and may not be fixed by mere automatic re-initialization of the CAN controller, a user-intervened recovery or even a controlled shutdown of the entire system is recommended.

Most vehicles will enter a limp-home mode and run with reduced functionality. Depending on the severity of the underlying issue, the vehicle may later be totally disabled.

I’m sure you’ve figured out where this is heading…

The bus-off attack

Start off by watching the bus for a while. Many messages are sent at set periods. For example, suppose message M is sent periodically by a victim ECU V. The attacker simply needs to inject a message satisfying the following three conditions:

  1. It has the same ID as message M
  2. It is transmitted at the same time as message M
  3. It has at least one bit position in which it is dominant (a zero), whereas it is recessive in M (a one). All the preceding bits should be the same as the message M.

Condition 1 is trivial to meet.

Condition 2 is also quite easy it turns out.  Nodes which have lost arbitration or had messages buffered while the bus was busy will attempt to transmit their messages as soon as the bus becomes idle. Given that message priorities and periodicities do not change, its quite easy to spot a CAN message that is always preceded by some other message. This unique preceded ID then acts as the signal for when to transmit the attack message.

Condition 3 means that the bit difference must occur in either the control or data field since message IDs have to match. The length of the data field for nearly all messages is >= 1, and therefore the simplest was is to bit flip one of the 1s from the Data Length Code (DLC).

So that’s one message put onto the bus that looks just like a regular message. At the start, the adversary and the victim are both in error-active mode. The victim will see the bit error, and because it was transmitting, it increases its TEC by 8. It also transmits an active error flag, as required by the protocol, which consists of six consecutive dominant bits (i.e. 000000) – this causes a transmit error at the adversary node too, and its TEC is increased by 8.

Now the CAN bus itself steps in to help out. The CAN controllers of the adversary and the victim automatically retransmit the Tx-failed messages again at the same time! (Should have taken a lesson from TCP there). After 16 retransmissions, both the victim and the adversary enter error-passive mode.

Automatic synchronized retransmission continues, but since the victim is error passive it transmits a passive error flag instead of an active one. The victims TEC goes up to 136, and then down again to 135 (-1) as the next retransmission succeeds. The adversary won the first retransmission round that caused the victim to generate the passive error flag transmission, so its count also went down by one to 127,  it returns to error active mode.

So far, the adversary has transmitted just one message, and the automatic transmissions of the CAN controller have done all the rest. It just remains for the adversary to transmit an additional attack message. Each time it does this, the victim’s TEC will go up by 7 (since it is in error passive mode). After enough periods, the victim is forced into bus-off mode and the attack is complete.

Although CAN messages’ ID values do not contain information on their actual transmitters, their values and intervals together imply the messages’ priority and safety-criticality. That is, if the attacker targets a message sent with high priority (i.e., a low ID value) and small message intervals, then the attacker would most likely disconnect a safety-critical ECU that sends important messages related to, for example, vehicle acceleration or braking.

The attack was confirmed on real vehicles under safe conditions.

Defence against the dark arts

The authors demonstrate a change to the CAN protocol which would defend against the attack by resetting error counters when 16 consecutive error frames with an active error flag are followed by successful transmission of another messaged with the same ID.