Hardening Applications with systemd

Because we live in the day and age where the new gods have taken over Linux, it’s a good idea to familiarize ourselves with their rituals. Some of them might seem strange to us, but some of them are actually very nice features. One of the features I really like about systemd are the built-in hardening capabilities.

The built-in options for hardening are quite extensive, and can best be compared to something like firejail. They both have similar capabilities, but firejail focuses more on desktop applications, whereas systemd hardening applies to systemd units. The hardening options are configured in the units service file, in the [Service] section.

Let’s look at the different options with an example: a service file I set up a while ago for a Discord bot. The bot is a self-hosted version of evobot.

# /etc/systemd/system/boombot.service
[Unit]
Description=Music Bot
After=network-online.target

[Service]
ExecStart=/usr/bin/node /srv/bot/DiscordBots/BoomBot/index.js
Type=simple
WorkingDirectory=/srv/bot/DiscordBots/BoomBot

# Hardening
PrivateDevices=true
PrivateTmp=true
ProtectControlGroups=true
ProtectSystem=full
ProtectKernelTunables=true
RestrictSUIDSGID=true
User=bot

[Install]
WantedBy=multi-user.target

As you can see, I already set up some basic hardening options, but we can certainly do better than that.

In hardening systemd units, two resources are particularly useful. The first one is systemd-analyze security <name>.service. This tool gives an overview of a lot of useful hardening options, whether they are turned on or not, and a score based on how many options are active. The score is based on a weighted system, where hardening options are given some points that are subtracted if they are activated. The scale goes from 10 (worst) to 0 (best).

The other resource is the man page for systemd.exec(5). It includes all possible hardening options (in the section SANDBOXING). Most options are described in great detail. This enables us to make informed choices about our hardening.

To start, let’s have a look at the output of systemd-analyze security boombot.service.

  NAME                                                        DESCRIPTION                                                             EXPOSURE
✗ RemoveIPC=                                                  Service user may leave SysV IPC objects around                               0.1
✗ RootDirectory=/RootImage=                                   Service runs within the host's root directory                                0.1
✓ User=/DynamicUser=                                          Service runs under a static non-root user identity
✗ CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes may change the system clock                                0.2
✗ NoNewPrivileges=                                            Service processes may acquire new privileges                                 0.2
✓ AmbientCapabilities=                                        Service process does not receive ambient capabilities
✗ ProtectClock=                                               Service may write to the hardware clock or system clock                      0.2
✗ CapabilityBoundingSet=~CAP_SYS_PACCT                        Service may use acct()                                                       0.1
✗ CapabilityBoundingSet=~CAP_KILL                             Service may send UNIX signals to arbitrary processes                         0.1
✗ ProtectKernelLogs=                                          Service may read from or write to the kernel log ring buffer                 0.2
✗ CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service may program timers that wake up the system                           0.1
✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service may override UNIX file/IPC permission checks                         0.2
✗ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service may mark files immutable                                             0.1
✗ CapabilityBoundingSet=~CAP_IPC_LOCK                         Service may lock memory into RAM                                             0.1
✗ ProtectKernelModules=                                       Service may load or read kernel modules                                      0.2
✗ CapabilityBoundingSet=~CAP_SYS_MODULE                       Service may load kernel modules                                              0.2
✗ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service may issue vhangup()                                                  0.1
✗ CapabilityBoundingSet=~CAP_SYS_BOOT                         Service may issue reboot()                                                   0.1
✗ CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service may issue chroot()                                                   0.1
✗ SystemCallArchitectures=                                    Service may execute system calls with all ABIs                               0.2
✗ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service may establish wake locks                                             0.1
✗ MemoryDenyWriteExecute=                                     Service may create writable executable memory mappings                       0.1
✗ RestrictNamespaces=~user                                    Service may create user namespaces                                           0.3
✗ RestrictNamespaces=~pid                                     Service may create process namespaces                                        0.1
✗ RestrictNamespaces=~net                                     Service may create network namespaces                                        0.1
✗ RestrictNamespaces=~uts                                     Service may create hostname namespaces                                       0.1
✗ RestrictNamespaces=~mnt                                     Service may create file system namespaces                                    0.1
✗ CapabilityBoundingSet=~CAP_LEASE                            Service may create file leases                                               0.1
✗ RestrictNamespaces=~cgroup                                  Service may create cgroup namespaces                                         0.1
✗ RestrictNamespaces=~ipc                                     Service may create IPC namespaces                                            0.1
✗ ProtectHostname=                                            Service may change system host/domainname                                    0.1
✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service may change file ownership/access mode/capabilities unrestricted      0.2
✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service may change UID/GID identities/capabilities                           0.3
✗ LockPersonality=                                            Service may change ABI personality                                           0.1
✗ RestrictAddressFamilies=~AF_PACKET                          Service may allocate packet sockets                                          0.2
✗ RestrictAddressFamilies=~AF_NETLINK                         Service may allocate netlink sockets                                         0.1
✗ RestrictAddressFamilies=~AF_UNIX                            Service may allocate local sockets                                           0.1
✗ RestrictAddressFamilies=~…                                  Service may allocate exotic sockets                                          0.3
✗ RestrictAddressFamilies=~AF_(INET|INET6)                    Service may allocate Internet sockets                                        0.3
✗ CapabilityBoundingSet=~CAP_MAC_*                            Service may adjust SMACK MAC                                                 0.1
✗ RestrictRealtime=                                           Service may acquire realtime scheduling                                      0.1
✗ ProtectSystem=                                              Service has very limited write access to the OS file hierarchy               0.1
✗ CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has ptrace() debugging abilities                                     0.3
✗ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has privileges to change resource use parameters                     0.1
✓ SupplementaryGroups=                                        Service has no supplementary groups
✓ CapabilityBoundingSet=~CAP_SYS_RAWIO                        Service has no raw I/O access
✓ PrivateTmp=                                                 Service has no access to other software's temporary files
✓ PrivateDevices=                                             Service has no access to hardware devices
✗ CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has network configuration privileges                                 0.2
✗ ProtectProc=                                                Service has full access to process tree (/proc hidepid=)                     0.2
✗ ProcSubset=                                                 Service has full access to non-process /proc files (/proc subset=)           0.1
✗ ProtectHome=                                                Service has full access to home directories                                  0.2
✗ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has elevated networking privileges                                   0.1
✗ CapabilityBoundingSet=~CAP_AUDIT_*                          Service has audit subsystem access                                           0.1
✗ CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has administrator privileges                                         0.3
✗ PrivateNetwork=                                             Service has access to the host's network                                     0.5
✗ PrivateUsers=                                               Service has access to other users                                            0.2
✗ CapabilityBoundingSet=~CAP_SYSLOG                           Service has access to kernel logging                                         0.1
✓ DeviceAllow=                                                Service has a minimal device ACL
✓ KeyringMode=                                                Service doesn't share key material with other services
✓ Delegate=                                                   Service does not maintain its own delegated control group subtree
✗ SystemCallFilter=~@clock                                    Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@cpu-emulation                            Service does not filter system calls                                         0.1
✗ SystemCallFilter=~@debug                                    Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@module                                   Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@mount                                    Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@obsolete                                 Service does not filter system calls                                         0.1
✗ SystemCallFilter=~@privileged                               Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@raw-io                                   Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@reboot                                   Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@resources                                Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@swap                                     Service does not filter system calls                                         0.2
✗ IPAddressDeny=                                              Service does not define an IP address allow list                             0.2
✓ NotifyAccess=                                               Service child processes cannot alter service state
✓ ProtectControlGroups=                                       Service cannot modify the control group file system
✓ PrivateMounts=                                              Service cannot install system mounts
✓ CapabilityBoundingSet=~CAP_MKNOD                            Service cannot create device nodes
✓ ProtectKernelTunables=                                      Service cannot alter kernel tunables (/proc/sys, …)
✓ RestrictSUIDSGID=                                           SUID/SGID file creation by service is restricted
✗ UMask=                                                      Files created by service are world-readable by default                       0.1

→ Overall exposure level for boombot.service: 7.6 EXPOSED 🙁

As you can see in the last line, systemd is not very happy with our hardening options. Let’s turn that frown upside down!

The first group of “options” I want to tackle is denying access to the file system. We already had ProtectSystem= set to “full”. With this, only certain parts of the OS are mounted read-only for the service (/usr, /boot, /etc). If we set it to strict, this extends to the complete file hierarchy. If we want to allow the service to write to certain paths, we can specify them manually with ReadWritePaths=. Since this bot doesn’t store any data, let’s set it to strict, and see if it still works. There are also other ways to restrict access to the file system. Let’s enable the following settings:

ProtectSystem=strict		# Disable write to entire file hierarchy
PrivateDevices=true		# Only allow access to pseudo-devices (eg: null, random, zero) in separate namespace	
PrivateTmp=true			# Mount /tmp in own namespace
ProtectKernelLogs=true		# Disable access to Kernel Logs
ProtectProc=invisible		# Disable access to information about other processes
PrivateUsers=true		# Disable access to other users on system
ProtectHome=true		# Disable access to /home
UMask=0077			# set umask

Let’s have a look at how far we’ve come with these options:

→ Overall exposure level for boombot.service: 6.7 MEDIUM 😐

That’s already a lot better! Let’s see if we can improve it. The next step is restricting access to the system.

RestrictNamespaces=true         		# Disable creating namespaces
LockPersonality=true            		# Locks personality system call
NoNewPrivileges=true            		# Service may not acquire new privileges
ProtectKernelModules=true       		# Service may not load kernel modules
SystemCallArchitectures=native  		# Only allow native system calls
ProtectHostname=true            		# Service may not change host name
RestrictAddressFamilies=AF_INET AF_INET6        # Service may only use IP address families
RestrictRealtime=true				# Disable realtime privileges
ProtectControlGroups=true			# Disable access to cgroups
ProtectKernelTunables=true			# Disable write access to kernel variables
RestrictSUIDSGID=true				# Disable setting suid or sgid bits
ProtectClock=true                               # Disable changing system clock
→ Overall exposure level for boombot.service: 4.7 OK 🙂

We have our first smile! Let’s keep going!

The next 2 options are very powerfull, but it is a bit harder to decide how to set them. They are CapabilityBoundingSet and SystemCallFilter.

The first option restricts (as its name suggests) capabilities. These are “privileges” that a process has. Examples of these capabilities include:

  • Read/write access to the Kernel audit log
  • Access to privileged BPF operations
  • Block system suspend
  • chown arbitrary files
  • kill arbitrary processes
  • change MAC address
  • Bind a socket to a port less than 1024
  • and many more

Many of these capabilities are not needed for normal processes, and they can be especially dangerous because they often ignore permissions (eg: CAP_CHOWN ignores filesystem permissions). Let’s disable all of them.

CapabilityBoundingSet=

You can also apply a whitelist, or invert the selection with ~ (eg: ~CAP_SYS_PTRACE).

→ Overall exposure level for boombot.service: 2.8 OK 🙂

Now for the hardest part (in my opinion): SystemCallFilter. This is a very powerful tool, as it allows you to restrict the system calls that the process can call. This isn’t easy to configure, and might need frequent updates, as the process gains new features.

To make this process easier, systemd includes some “groupings”. These are prepended with an ‘@’, and include several groups of system calls. Examples of these include

  • @swap
  • @timer
  • @reboot
  • @raw-io
  • etc.

For simplicity, we’ll use a special grouping, @system-service. This includes a “set of system calls used by common services, excluding any special purpose calls”. The man page recommends this as a starting point for custom lists. We also deny 2 other groups that systemd-analyze security recommends:

SystemCallFilter=@system-service
SystemCallFilter=~@privileged @resources # System calls related to super-user

After setting these options, we arrive at our final assessment:

→ Overall exposure level for boombot.service: 1.1 OK 🙂

That seems like a pretty nice verdict! Our process is now locked down quite well. Of course, there are other things that can still be done, like setting a more specific system call filter, but this is a good start. I’ll include the full service file here, so you can have a look at it and use it as a template for your own unit files.

# /etc/systemd/system/boombot.service
[Unit]
Description=Music Bot
After=network-online.target

[Service]
ExecStart=/usr/bin/node /srv/bot/DiscordBots/BoomBot/index.js
Type=simple
WorkingDirectory=/srv/bot/DiscordBots/BoomBot
User=bot

# Hardening
## File System
ProtectSystem=strict
PrivateDevices=true
PrivateTmp=true
ProtectKernelLogs=true
ProtectProc=invisible
PrivateUsers=true
ProtectHome=true
UMask=0077

## System
RestrictNamespaces=true
LockPersonality=true
NoNewPrivileges=true
ProtectKernelModules=true
SystemCallArchitectures=native
ProtectHostname=true
RestrictAddressFamilies=AF_INET AF_INET6
RestrictRealtime=true
ProtectControlGroups=true
ProtectKernelTunables=true
RestrictSUIDSGID=true
ProtectClock=true
RemoveIPC=true

## Capabilities and syscalls
CapabilityBoundingSet=
SystemCallFilter=@system-service
SystemCallFilter=~@privileged @resources

[Install]
WantedBy=multi-user.target

I hope you enjoyed reading this article, and I hope you get inspired to have a closer look at your services and lock them down!


Articles from blogs I read - Generated by openring