Project Zomboid; or, Taming Misbehaving Services With systemd

My friends and I recently discovered Project Zomboid. It seems like a pretty fun little survival game.

To play this game in multiplayer, you need a dedicated server. No problem I thought, I’ll just set it up quickly. This decision sent me on a 3-day trip of trial, despair and ultimately, success. I’ll use this blog post to document the problems I went through, as well as to generally muse about the state of this game.

Note: This article might be a bit harsh at times. It is not intended to disparage any single individual that was involved in the development of this game. Also keep in mind that the game is in its early stages, so we will hopefully see some improvement in these areas.

I first installed this game using steamCMD. After that was done, I started configuring the service with systemd. Usually, the first thing I look at, is how the program is started normally. The server version of Project Zomboid is a Java application that is started by a bash script.

#!/bin/bash

INSTDIR="`dirname $0`" ; cd "${INSTDIR}" ; INSTDIR="`pwd`"

if "${INSTDIR}/jre64/bin/java" -version > /dev/null 2>&1; then
        echo "64-bit java detected"
        export PATH="${INSTDIR}/jre64/bin:$PATH"
        export LD_LIBRARY_PATH="${INSTDIR}/linux64:${INSTDIR}/natives:${INSTDIR}:${INSTDIR}/jre64/lib/amd64:${LD_LIBRARY_PATH}"
        JSIG="libjsig.so"
        LD_PRELOAD="${LD_PRELOAD}:${JSIG}" ./ProjectZomboid64 "$@"
elif "${INSTDIR}/jre/bin/java" -client -version > /dev/null 2>&1; then
        echo "32-bit java detected"
        export PATH="${INSTDIR}/jre/bin:$PATH"
        export LD_LIBRARY_PATH="${INSTDIR}/linux32:${INSTDIR}/natives:${INSTDIR}:${INSTDIR}/jre/lib/i386:${LD_LIBRARY_PATH}"
        JSIG="libjsig.so"
        LD_PRELOAD="${LD_PRELOAD}:${JSIG}" ./ProjectZomboid32 "$@"
else
        echo "couldn't determine 32/64 bit of java"
fi
exit 0

This script seems to be written by someone that is not very familiar with bash/shell scripting. There are several anti-patterns here (re-assigning a “constant”, using exit 0 at the end of the script, no error handling, using backticks, no double quotes, etc). Many of these errors are picked up (and optionally automatically fixed) with shellcheck --diff start-server.sh:

--- a/start-server.sh
+++ b/start-server.sh
@@ -7,7 +7,7 @@
 #
 ############

-INSTDIR="`dirname $0`" ; cd "${INSTDIR}" ; INSTDIR="`pwd`"
+INSTDIR="$(dirname "$0")" ; cd "${INSTDIR}" || exit ; INSTDIR="$(pwd)"

 if "${INSTDIR}/jre64/bin/java" -version > /dev/null 2>&1; then
        echo "64-bit java detected"

What’s stranger is the way this script detects the java version. It does not actually parse the output of the version information, even though it executes the relevant command (java -version). Instead, it checks where Steam installed the bundled JRE and uses the name of the path as an indication for the version.

This script could be replaced by a systemd service completely, but I decided to keep it for now. Time will tell whether this is a good idea.

systemd

I created a simple systemd service file to execute the server. The first problem presented itself immediately: The application asked for the administrator password on stdin. I reran the binary without systemd to be able to enter this password.

After creating the service and adding some hardening options, I found out after some research that the server does not implement signal handling. The developers explicitly say that running the binary in the foreground is the only supported method, since the server can only be shut down safely using the console commands. To circumvent this, I decided to create a systemd socket that creates a named pipe. This pipe is connected to the stdin of the process. This means I can send commands to the server to manage it, even when it is run via systemd. The socket file looks like this:

[Socket]
ListenFIFO=%t/zomboid.sock
SocketMode=0660
SocketUser=zomboid
SocketGroup=games
RemoveOnStop=true

[Install]
WantedBy=sockets.target

The corresponding options in the service file are:

[Unit]
Description=Zomboid Headless Server
After=network-online.target
Requires=zomboid.socket

[Service]
WorkingDirectory=/srv/zomboid/game
ExecStart=/srv/zomboid/game/start-server.sh
Type=exec
User=zomboid
StandardInput=socket
StandardOutput=journal
StandardError=journal
Sockets=zomboid.socket
ExecStop=/bin/sh -c "echo quit > /run/zomboid.sock"

By modifying the ExecStop= option, we can make sure that the server is stopped cleanly when using systemctl stop zomboid.service.

The developers have signalled (no pun intended) that they want to fix this issue, which is very commendable. I hope a fix is implemented soon.

Logging and logfiles

The strange decisions continue when it comes to logging. I did not expect the application to honor all best practices, but I did expect some kind of standards compliance, or at least configurability.

The application does 2 types of logging: it outputs messages to stdout. This is reasonable, but the logging format itself is very strange:

[03-11-23 14:20:55.140] LOG  : Multiplayer , 1699017655140> 1,021,965,239> [MPStatistics] mem usage notification threshold=8,160,437,760.
[03-11-23 14:20:57.820] LOG  : Network     , 1699017657820> 1,021,967,919> [03-11-23 14:20:57.820] > ZNet: SSteamSDK -> SZombienet: OnPolicyResponse.
[03-11-23 14:20:57.821] LOG  : Network     , 1699017657821> 1,021,967,920> [03-11-23 14:20:57.821] > ZNet: OnPolicyResponse.
[03-11-23 14:20:57.821] LOG  : Network     , 1699017657821> 1,021,967,921> [03-11-23 14:20:57.821] > ZNet: SZombienet -> SSteamSDK: BSecure.
[03-11-23 14:20:57.822] LOG  : Network     , 1699017657822> 1,021,967,921> [03-11-23 14:20:57.822] > ZNet: Zomboid Server is VAC Secure.

I don’t know much about logging facilities in Java, but this is horrible. It does not adhere to the syslog standard, the date format is strange, the default severity is called LOG, it uses fixed width fields, each line has three timestamps in different formats (some lines even have four).

In my opinion, this logging system would be improved by just using System.out.println(). At least then formatting could be handled by the logging daemon and messages would be actually readable.

I know that logging is complicated topic, but this solution seems to use the worst of both worlds.

The second problem is that the application also writes this information to several log files in a custom folder (Logs/). There is no way to turn this off or change the path. The logs have strange file names (03-11-23_14-20-04_DebugLog-server.txt) and are rotated by the application itself (!) using a strange algorithm: the application creates folders for each day and log files are moved into it after each application restart. This makes rotating the logs using something like logrotate supremely difficult, if not entirely impossible.

For this reason, I decided to simply discard these log files. The messages are saved in the systemd journal anyway, I simply do not need these files. The first solution I thought of is using something like nullfsvfs (eg: “/dev/null as a filesystem”). This seemed very interesting, but ultimately overkill. I decided to use the simpler, sledgehammer solution:

Deleting all files in Logs/ every time the service is stopped.

This can be done via one line in the systemd service:

ExecStopPost=/bin/sh -c "rm -rf /srv/zomboid/Zomboid/Logs/*"

Logs that don’t follow a single event stream and are not rotateable are useless at best and a risk (disk space) at worst. I really hope the developers think about implementing logging in a more standards-compliant and sensible way. Adhering to the twelve factor guidelines on logging should be a huge improvement. If this is not possible, at least allow users to choose where they want logs to go (eg: console, file, etc). (This would contradict the 12 Factor App Guidelines, but it would at least be better than the status quo. Also logfile names should not have a timestamp (that’s what filesystem metadata is for). Thirdly, don’t try to rotate logs yourself. Let system administration utilities (eg: logrotate, journald) handle this for you.

Complete file

In conclusion, this is the complete systemd file for the service:

[Unit]
Description=Zomboid Headless Server
After=network-online.target
Requires=zomboid.socket

[Service]
WorkingDirectory=/srv/zomboid/game
ExecStart=/srv/zomboid/game/start-server.sh
User=zomboid
StandardInput=socket
StandardOutput=journal
StandardError=journal
Type=exec
Sockets=zomboid.socket
ExecStop=/bin/sh -c "echo quit > /run/zomboid.sock"
ExecStopPost=/bin/sh -c "rm -rf /srv/zomboid/Zomboid/Logs/*"

# hardening
ProtectControlGroups=true
RestrictSUIDSGID=true
ProtectClock=true
ProtectHome=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectProc=invisible
RemoveIPC=true
PrivateDevices=true
ProtectHostname=true
NoNewPrivileges=true
CapabilityBoundingSet=
RestrictNamespaces=true
SystemCallArchitectures=native
LockPersonality=true
RestrictRealtime=true
SystemCallErrorNumber=EPERM
UMask=0077
RestrictAddressFamilies=AF_INET AF_INET6
PrivateUsers=true
PrivateTmp=true
ProtectKernelTunables=true
MemoryDenyWriteExecute=false
SystemCallFilter=@system-service
SystemCallFilter=~@resources
SystemCallFilter=~@privileged
PrivateNetwork=false
ProtectSystem=strict
ReadWritePaths=/srv/zomboid/Zomboid

[Install]
WantedBy=multi-user.target

With the hardening options active, we can get it down to quite a nice score:

→ Overall exposure level for zomboid.service: 1.1 OK 🙂

Networking

After starting the service first tries to open its ports on the router via UPnP. Another anti-feature in my opinion: UPnP should be disabled by default and left as a concious choice for the users that really want it (in my opinion, you should never actually want/need UPnP, so it might even be better to disable this “feature” completely).

Since my router does not implement UPnP, I disabled it and opened the requisite ports in my firewall (16261 and 16262).

I did not want my server to appear in the games global server browser, so I decided to create a DNS record that points to the server. I could then tell my friends to use that name when they want to connect. Imagine my surprise when I tested this, and the client still could not connect to the server. After some digging I found the following:

ss -unlp | grep 1626

UNCONN 0      0                          0.0.0.0:16261      0.0.0.0:*    users:(("ProjectZomboid6",pid=1660231,fd=40))
UNCONN 0      0                          0.0.0.0:16262      0.0.0.0:*    users:(("ProjectZomboid6",pid=1660231,fd=42))

The application only supports IPv4. To remedy this, I created a custom record that only resolves my IPv4 address. With this in hand I tried again, to no avail. Turns out, the client does not support name resolution at all.

At this point I gave up trying to find a nice solution and instead chose a solution. I told my friends to do a nslookup/dig of my IPv4-only record and that they should use the output of that in the client.

I would urge the developers to implement IPv6 and support name resolution. This should really make the user experience better for servers that can’t or don’t want to be in the global server browser.

User management

User management is another strange beast. A specific server has local users that are not connected to the steam users at all. These users have a username and a password. There is also an optional server password. To limit access, we can either configure a server password or configure whitelisting. If there is a whitelist, the server administrator must add all of the users and their passwords beforehand. As far as I can tell, there is no way for the user to change their password later. Since this is a huge hassle, I decided to simply create a server password and disable whitelisting.

Conclusion

First of all, a big thank you to the developers of Project Zomboid. I very much look forward to playing it. It looks like a nice game and I am glad that the server is now working. This was way harder than it needed to be though, and I wonder how people that have less technical knowledge than me could set this up sensibly. I really hope the developers put some time and effort into these things. These are fundamental problems, that fortunately should have easy solutions. By programming your applications according to relevant standards, you can save yourself a lot of hassle in the long run.


Articles from blogs I read - Generated by openring