Introducing stapbpf - SystemTap's new BPF backend

SystemTap 3.2 includes an early prototype of SystemTap's new BPF backend (stapbpf). It represents a first step towards leveraging powerful new tracing and performance analysis capabilities recently added to the Linux kernel. In this post, I will compare the translation process of stapbpf with the default backend (stap) and compare some differences in functionality between these two backends.

Stap and stapbpf share common parsing and semantic analysis stages. As input for translation, both backends receive data structures representing a parse tree complete with type information and references to the definitions of all variables and functions being used. A summary of this information can be displayed using the stap command's '-p2' option.

$ cat sample.stp
probe kernel.function("sys_read") { printf("hi from sys_read!\n"); exit() }

$ stap -p2 sample.stp
# functions
exit:unknown ()
kernel.function("SyS_read@fs/read_write.c:542") /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */

$ stap -p2 --runtime=bpf sample.stp
# functions
_set_exit_status:long ()
exit:unknown ()
# probes
kernel.function("SyS_read@fs/read_write.c:542") /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */

You can see that stapbpf's exit function involves an additional call to _set_exit_status but otherwise, the two backends are probing the same location.

From this point, the translation processes diverge. Stap's goal is to convert the script into a kernel module. To accomplish this, stap translates the parse tree into the C source code of the desired kernel module. At runtime, GCC is used to compile this source code into the actual kernel module. The '-p4' option can be used with the stap command to produce the kernel object file.

# stap -p4 sample.stp
[...]_1316.ko
# staprun [...]_1316.ko
hi from sys_read!

Instead of C, stapbpf translates the script directly into BPF bytecode to be executed by an in-kernel virtual machine. The bytecode is then stored in a BPF-ELF file intended for use by the stapbpf runtime.

# stap -p4 --runtime=bpf sample.stp
stap_1348.bo
# stapbpf stap_1348.bo
hi from sys_read!

Unlike stap's kernel modules, producing the BPF bytecode requires no external compiler. This helps keep stapbpf's compile times and installation footprint low. With the '-v' option, we can see the duration of each stage of translation.

# stap -v -p4 sample.stp
[...]
Pass 3: translated to C [...] in 0usr/0sys/4real ms.
Pass 4: compiled C [...] in 1330usr/310sys/1559real ms.

# stap -v -p4 --runtime=bpf sample.stp
[...]
Pass 4: compiled BPF into "stap_3792.bo" in 0usr/0sys/0real ms.

Notice that pass 3 and 4 takes 1563ms for stap but <1ms for stapbpf (which combines pass 3 and 4 into a single pass).

When loading BPF programs into the kernel, they are first checked for safety by a BPF verifier built into the kernel. It checks for undesirable behaviors such as out of bound jumps, out of bound stack loads/stores and reads from uninitialized addresses. It also checks for the presence of unreachable instructions. Any BPF program, which does not pass verification will not be loaded into the BPF virtual machine. Although the default stap is held to similar standards and is known to be very safe to use, stapbpf has the advantage of inheriting BPF's simpler security model.

However, this advantage does come with some trade-offs. For example, BPF does not support writing to kernel memory. Although stap disables this ability by default, it does provide a "guru mode" that acts as an escape hatch for the user who wishes to have this level of control over their operating system. This means that stapbpf does not share stap's ability to, for example, administer security band-aids to a live system. Even more restricting is that in order to ensure that BPF programs terminate quickly; the verifier rejects any program with loops. While it would be possible for stapbpf to perform loop unwinding, BPF also imposes a limit of 4096 instructions per program.

# stap --runtime=bpf contains_loops.stp
Error loading /tmp/stapxSM7Kg/stap_8316.bo: bpf program load failed: Invalid argument
[...]
Pass 5: run failed.

# stap --runtime=bpf too_many_insns.stp
Error loading /tmp/stapqxRXi4/stap_8432.bo: bpf program load failed: Argument list too long
[...]
Pass 5: run failed.

The following table is a summary comparing stap and stapbpf. Features which BPF permits but are not yet implemented in stapbpf are indicated with 'possible'.

	stap	stapbpf
non-blocking probe handlers	yes	yes
protected probe execution environment	yes	yes
lock-protected global variables	per probe locking	per operation locking
kprobes (DWARF)	yes	yes
kprobes (DWARF-less)	yes	possible
uprobes	yes	possible
tracepoint probes	yes	possible
probe dynamically loaded kernel objects	yes	possible
timer-based probing	yes	yes
able to change state in probed program	yes	possible (userspace only)
means available to bypass protections for advanced users	yes	no
loop support (for, while, for each)	yes	limited*
string support (variables, literals)	yes	limited**
probe handler length limit	1000 statements	4096 instructions
means available to increase handler length limit	yes	no
kernel verifies the safety of program	no	yes

* For and while loops are enabled in begin and end probes. These probes are executed in user space and therefore do not require verification.
** There is support for printf's format string literal.

It can be seen that stapbpf is able to provide only a subset of stap's functionality. However, for systems whose security policies either prevent the full kernel module backend or require software with a security model simpler than stap's, stapbpf aims to provide a convenient way to use this subset.

Take advantage of your Red Hat Developers membership and download RHEL today at no cost.

Last updated: December 12, 2017

Linux

Java runtimes & frameworks

Kubernetes

Integration & App Connectivity

Automation

Developer tools

Developer Sandbox for Red Hat OpenShift

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Secure Development & Architectures

Platform Engineering

Automated Data Processing

Start exploring in the Developer Sandbox for free

Interactive Lessons and Learning Paths

Developer Sandbox Activities

E-Books

Tutorials

Cheat Sheets

API Catalog

Red Hat Learning

Tech Talks

Deep Dives

Red Hat Summit

Introducing stapbpf - SystemTap's new BPF backend

Containerize Node.js applications at the edge on RHEL and Fedora

How to monitor OpenShift using the Datadog Operator

Red Hat build of Keycloak high availability: A simplified approach

Patch updates on RHEL servers with Ansible Automation Platform 2.4

Modernization - A reference appraoch, where to begin and how

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue