Because we live in the day and age where the new gods have taken over Linux, it’s a good idea to familiarize ourselves with their rituals. Some of them might seem strange to us, but some of them are actually very nice features. One of the features I really like about systemd are the built-in hardening capabilities.
The built-in options for hardening are quite extensive, and can best be
compared to something like firejail. They
both have similar capabilities, but firejail focuses more on desktop
applications, whereas systemd hardening applies to systemd units. The hardening
options are configured in the units service file, in the
Let’s look at the different options with an example: a service file I set up a while ago for a Discord bot. The bot is a self-hosted version of evobot.
# /etc/systemd/system/boombot.service [Unit] Description=Music Bot After=network-online.target [Service] ExecStart=/usr/bin/node /srv/bot/DiscordBots/BoomBot/index.js Type=simple WorkingDirectory=/srv/bot/DiscordBots/BoomBot # Hardening PrivateDevices=true PrivateTmp=true ProtectControlGroups=true ProtectSystem=full ProtectKernelTunables=true RestrictSUIDSGID=true User=bot [Install] WantedBy=multi-user.target
As you can see, I already set up some basic hardening options, but we can certainly do better than that.
In hardening systemd units, two resources are particularly useful. The first
systemd-analyze security <name>.service. This tool gives an overview
of a lot of useful hardening options, whether they are turned on or not, and a
score based on how many options are active. The score is based on a weighted
system, where hardening options are given some points that are subtracted if
they are activated. The scale goes from 10 (worst) to 0 (best).
resource is the man page for
systemd.exec(5). It includes
all possible hardening options (in the section
SANDBOXING). Most options are described in great detail. This enables us to make informed
choices about our hardening.
To start, let’s have a look at the output of
systemd-analyze security boombot.service.
NAME DESCRIPTION EXPOSURE ✗ RemoveIPC= Service user may leave SysV IPC objects around 0.1 ✗ RootDirectory=/RootImage= Service runs within the host's root directory 0.1 ✓ User=/DynamicUser= Service runs under a static non-root user identity ✗ CapabilityBoundingSet=~CAP_SYS_TIME Service processes may change the system clock 0.2 ✗ NoNewPrivileges= Service processes may acquire new privileges 0.2 ✓ AmbientCapabilities= Service process does not receive ambient capabilities ✗ ProtectClock= Service may write to the hardware clock or system clock 0.2 ✗ CapabilityBoundingSet=~CAP_SYS_PACCT Service may use acct() 0.1 ✗ CapabilityBoundingSet=~CAP_KILL Service may send UNIX signals to arbitrary processes 0.1 ✗ ProtectKernelLogs= Service may read from or write to the kernel log ring buffer 0.2 ✗ CapabilityBoundingSet=~CAP_WAKE_ALARM Service may program timers that wake up the system 0.1 ✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service may override UNIX file/IPC permission checks 0.2 ✗ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE Service may mark files immutable 0.1 ✗ CapabilityBoundingSet=~CAP_IPC_LOCK Service may lock memory into RAM 0.1 ✗ ProtectKernelModules= Service may load or read kernel modules 0.2 ✗ CapabilityBoundingSet=~CAP_SYS_MODULE Service may load kernel modules 0.2 ✗ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG Service may issue vhangup() 0.1 ✗ CapabilityBoundingSet=~CAP_SYS_BOOT Service may issue reboot() 0.1 ✗ CapabilityBoundingSet=~CAP_SYS_CHROOT Service may issue chroot() 0.1 ✗ SystemCallArchitectures= Service may execute system calls with all ABIs 0.2 ✗ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND Service may establish wake locks 0.1 ✗ MemoryDenyWriteExecute= Service may create writable executable memory mappings 0.1 ✗ RestrictNamespaces=~user Service may create user namespaces 0.3 ✗ RestrictNamespaces=~pid Service may create process namespaces 0.1 ✗ RestrictNamespaces=~net Service may create network namespaces 0.1 ✗ RestrictNamespaces=~uts Service may create hostname namespaces 0.1 ✗ RestrictNamespaces=~mnt Service may create file system namespaces 0.1 ✗ CapabilityBoundingSet=~CAP_LEASE Service may create file leases 0.1 ✗ RestrictNamespaces=~cgroup Service may create cgroup namespaces 0.1 ✗ RestrictNamespaces=~ipc Service may create IPC namespaces 0.1 ✗ ProtectHostname= Service may change system host/domainname 0.1 ✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service may change file ownership/access mode/capabilities unrestricted 0.2 ✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service may change UID/GID identities/capabilities 0.3 ✗ LockPersonality= Service may change ABI personality 0.1 ✗ RestrictAddressFamilies=~AF_PACKET Service may allocate packet sockets 0.2 ✗ RestrictAddressFamilies=~AF_NETLINK Service may allocate netlink sockets 0.1 ✗ RestrictAddressFamilies=~AF_UNIX Service may allocate local sockets 0.1 ✗ RestrictAddressFamilies=~… Service may allocate exotic sockets 0.3 ✗ RestrictAddressFamilies=~AF_(INET|INET6) Service may allocate Internet sockets 0.3 ✗ CapabilityBoundingSet=~CAP_MAC_* Service may adjust SMACK MAC 0.1 ✗ RestrictRealtime= Service may acquire realtime scheduling 0.1 ✗ ProtectSystem= Service has very limited write access to the OS file hierarchy 0.1 ✗ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has ptrace() debugging abilities 0.3 ✗ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE) Service has privileges to change resource use parameters 0.1 ✓ SupplementaryGroups= Service has no supplementary groups ✓ CapabilityBoundingSet=~CAP_SYS_RAWIO Service has no raw I/O access ✓ PrivateTmp= Service has no access to other software's temporary files ✓ PrivateDevices= Service has no access to hardware devices ✗ CapabilityBoundingSet=~CAP_NET_ADMIN Service has network configuration privileges 0.2 ✗ ProtectProc= Service has full access to process tree (/proc hidepid=) 0.2 ✗ ProcSubset= Service has full access to non-process /proc files (/proc subset=) 0.1 ✗ ProtectHome= Service has full access to home directories 0.2 ✗ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has elevated networking privileges 0.1 ✗ CapabilityBoundingSet=~CAP_AUDIT_* Service has audit subsystem access 0.1 ✗ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has administrator privileges 0.3 ✗ PrivateNetwork= Service has access to the host's network 0.5 ✗ PrivateUsers= Service has access to other users 0.2 ✗ CapabilityBoundingSet=~CAP_SYSLOG Service has access to kernel logging 0.1 ✓ DeviceAllow= Service has a minimal device ACL ✓ KeyringMode= Service doesn't share key material with other services ✓ Delegate= Service does not maintain its own delegated control group subtree ✗ SystemCallFilter=~@clock Service does not filter system calls 0.2 ✗ SystemCallFilter=~@cpu-emulation Service does not filter system calls 0.1 ✗ SystemCallFilter=~@debug Service does not filter system calls 0.2 ✗ SystemCallFilter=~@module Service does not filter system calls 0.2 ✗ SystemCallFilter=~@mount Service does not filter system calls 0.2 ✗ SystemCallFilter=~@obsolete Service does not filter system calls 0.1 ✗ SystemCallFilter=~@privileged Service does not filter system calls 0.2 ✗ SystemCallFilter=~@raw-io Service does not filter system calls 0.2 ✗ SystemCallFilter=~@reboot Service does not filter system calls 0.2 ✗ SystemCallFilter=~@resources Service does not filter system calls 0.2 ✗ SystemCallFilter=~@swap Service does not filter system calls 0.2 ✗ IPAddressDeny= Service does not define an IP address allow list 0.2 ✓ NotifyAccess= Service child processes cannot alter service state ✓ ProtectControlGroups= Service cannot modify the control group file system ✓ PrivateMounts= Service cannot install system mounts ✓ CapabilityBoundingSet=~CAP_MKNOD Service cannot create device nodes ✓ ProtectKernelTunables= Service cannot alter kernel tunables (/proc/sys, …) ✓ RestrictSUIDSGID= SUID/SGID file creation by service is restricted ✗ UMask= Files created by service are world-readable by default 0.1 → Overall exposure level for boombot.service: 7.6 EXPOSED 🙁
As you can see in the last line, systemd is not very happy with our hardening options. Let’s turn that frown upside down!
The first group of “options” I want to tackle is denying access to the file
system. We already had
ProtectSystem= set to “full”. With this, only certain
parts of the OS are mounted read-only for the service (/usr, /boot, /etc). If
we set it to strict, this extends to the complete file hierarchy. If we want
to allow the service to write to certain paths, we can specify them manually
ReadWritePaths=. Since this bot doesn’t store any data, let’s set it to
strict, and see if it still works. There are also other ways to restrict access
to the file system. Let’s enable the following settings:
ProtectSystem=strict # Disable write to entire file hierarchy PrivateDevices=true # Only allow access to pseudo-devices (eg: null, random, zero) in separate namespace PrivateTmp=true # Mount /tmp in own namespace ProtectKernelLogs=true # Disable access to Kernel Logs ProtectProc=invisible # Disable access to information about other processes PrivateUsers=true # Disable access to other users on system ProtectHome=true # Disable access to /home UMask=0077 # set umask
Let’s have a look at how far we’ve come with these options:
→ Overall exposure level for boombot.service: 6.7 MEDIUM 😐
That’s already a lot better! Let’s see if we can improve it. The next step is restricting access to the system.
RestrictNamespaces=true # Disable creating namespaces LockPersonality=true # Locks personality system call NoNewPrivileges=true # Service may not acquire new privileges ProtectKernelModules=true # Service may not load kernel modules SystemCallArchitectures=native # Only allow native system calls ProtectHostname=true # Service may not change host name RestrictAddressFamilies=AF_INET AF_INET6 # Service may only use IP address families RestrictRealtime=true # Disable realtime privileges ProtectControlGroups=true # Disable access to cgroups ProtectKernelTunables=true # Disable write access to kernel variables RestrictSUIDSGID=true # Disable setting suid or sgid bits ProtectClock=true # Disable changing system clock
→ Overall exposure level for boombot.service: 4.7 OK 🙂
We have our first smile! Let’s keep going!
The next 2 options are very powerfull, but it is a bit harder to decide how to
set them. They are
The first option restricts (as its name suggests) capabilities. These are “privileges” that a process has. Examples of these capabilities include:
- Read/write access to the Kernel audit log
- Access to privileged BPF operations
- Block system suspend
- chown arbitrary files
- kill arbitrary processes
- change MAC address
- Bind a socket to a port less than 1024
- and many more
Many of these capabilities are not needed for normal processes, and they can
be especially dangerous because they often ignore permissions (eg:
ignores filesystem permissions). Let’s disable all of them.
You can also apply a whitelist, or invert the selection with
→ Overall exposure level for boombot.service: 2.8 OK 🙂
Now for the hardest part (in my opinion):
SystemCallFilter. This is a very
powerful tool, as it allows you to restrict the system calls that the process
can call. This isn’t easy to configure, and might need frequent updates, as the
process gains new features.
To make this process easier, systemd includes some “groupings”. These are prepended with an ‘@’, and include several groups of system calls. Examples of these include
For simplicity, we’ll use a special grouping,
@system-service. This includes
a “set of system calls used by common services, excluding any special purpose
calls”. The man page recommends this as a starting point for custom lists. We
also deny 2 other groups that
systemd-analyze security recommends:
SystemCallFilter=@system-service SystemCallFilter=~@privileged @resources # System calls related to super-user
After setting these options, we arrive at our final assessment:
→ Overall exposure level for boombot.service: 1.1 OK 🙂
That seems like a pretty nice verdict! Our process is now locked down quite well. Of course, there are other things that can still be done, like setting a more specific system call filter, but this is a good start. I’ll include the full service file here, so you can have a look at it and use it as a template for your own unit files.
# /etc/systemd/system/boombot.service [Unit] Description=Music Bot After=network-online.target [Service] ExecStart=/usr/bin/node /srv/bot/DiscordBots/BoomBot/index.js Type=simple WorkingDirectory=/srv/bot/DiscordBots/BoomBot User=bot # Hardening ## File System ProtectSystem=strict PrivateDevices=true PrivateTmp=true ProtectKernelLogs=true ProtectProc=invisible PrivateUsers=true ProtectHome=true UMask=0077 ## System RestrictNamespaces=true LockPersonality=true NoNewPrivileges=true ProtectKernelModules=true SystemCallArchitectures=native ProtectHostname=true RestrictAddressFamilies=AF_INET AF_INET6 RestrictRealtime=true ProtectControlGroups=true ProtectKernelTunables=true RestrictSUIDSGID=true ProtectClock=true RemoveIPC=true ## Capabilities and syscalls CapabilityBoundingSet= SystemCallFilter=@system-service SystemCallFilter=~@privileged @resources [Install] WantedBy=multi-user.target
I hope you enjoyed reading this article, and I hope you get inspired to have a closer look at your services and lock them down!