NAS-Upgrade; How I stopped worrying and learned to love ZFS
I have a Server in my living room that runs Arch Linux. I originally set up this server to provide a NAS. The NAS serves as a backup solution for the clients in my network, and also stores my media collection.
When I initially set it up, I had just one (Desktop!) HDD: A Seagate Barracuda with 8TB of storage. For the filesystem, I chose Btrfs at the time because I heard some good things about it, and looking through the features it seemed to do what I wanted. But the longer I used it, the more problems creeped up. It doesn’t have native filesystem encryption, so I had to use LUKS. It supports Quotas, but doesn’t have a good way of displaying how much storage a specific subvolume/snapshot actually uses. It supports snapshots, but not too many of them. I also thought that subvolumes were very neat, so I created a lot of them to give each “application” a unique path that I tried to cram into the FHS-Philosophy. This caused more trouble than it solved though, because now I had a lot of different paths that I had to write down. I also made a bash script to mount all of these different subvolumes, and had to frequently look inside this script to keep all of the paths and subvolumes together. And with every Service I added, I needed to update this mounting script. This wasn’t really hard or complicated, but very annoying.
#!/bin/bash DISK1='/dev/sde1' CNAME='HDD_1' ROOT_PATH='/mnt' JELLYFIN="jellyfin" NAS_ROOT="/media/smbnas/" BACKUP_ROOT="backup" echo "Opening disk" sudo cryptsetup open "$DISK1" "$CNAME" || echo 'failed' echo "Mounting btrfs root" sudo mount "/dev/mapper/$CNAME" "$ROOT_PATH/$CNAME" -o compress=zstd,autodefrag || echo 'failed' echo "Mounting btrfs subvol 'jellyfin'" sudo mount "/dev/mapper/$CNAME" "$NAS_ROOT/$JELLYFIN" -o compress=zstd,autodefrag,subvol="/$JELLYFIN" || echo 'failed' echo "Mounting btrfs subvol 'backup/biggs'" mount "/dev/mapper/$CNAME" "$NAS_ROOT/$BACKUP_ROOT/biggs" -o compress=zstd,autodefrag,subvol="/$BACKUP_ROOT/biggs" || echo 'failed' echo "Mounting nextcloud subvol" mount "/dev/mapper/$CNAME" "/var/nextcloud" -o compress=zstd,autodefrag,subvol="/nextcloud" || echo 'failed' echo "Mounting XXX subvol" mount "/dev/mapper/$CNAME" "/var/www/XXX.sergeantbiggs.net" -o compress=zstd,autodefrag,subvol="/tiktok" || echo "failed"
The script I used. It’s a
bit of a mess.
I also created another script to automate snapshot creation. This was to protect the files on my SMB shares. If you delete a file from an SMB share on the client side, it’s gone. With this, I would have a chance to restore things if they were accidentally deleted. This made listing the subvolumes a complete shitshow, because the list was completetely cluttered with the snapshots. So a lot of Btrfs tools were just not working very well in my situation and I had to rely on scripts, both self-written and third-party.
That’s why I decided to move to ZFS. In this article, I will take you on the journey I faced to do this. I decided to this at the same time that I did a major hardware upgrade. Up to that point I just had one Desktop HDD. I decided to replace it with 3 WD Reds, each 8 TB in size. I will be running these in RAIDZ, an improved version of RAID5 that removes the write hole (a general problem in RAID 5, something Btrfs hasn’t been able to solve yet). It is also generally very efficient.
So, how will we go about this whole thing? There are 2 options:
Option 1 would be to replace all the Btrfs subvolumes with the ZFS equivalent (Datasets) and mount them at the exact same locations. The advantage would be that I don’t have to reconfigure the applications that use the storage. The disadvantage would be that my services would be down while the data transfers (and for over 5 TB, that would take a while).
Option 2 would be to mount both volumes, transfer the data and reconfigure the applications after the transfer is complete. The advantage here would that I would just have very short interruptions in service.
I decided to go for option 2. This has the other advantage that it forces me to organise the data in a different way, which is hopefully more sensible than scores of subvolumes that mount to different folders somewhere in the hierarchy.
First I had to decide how I wanted to install ZFS. If you don’t know about this
problem, here’s a short explainer.
ZFS uses a free software license, the CDDL
This is a so-called copyleft license (a license which restricts using the code
for proprietary software). One problem with this is that it is incompatible
with the GPL. This means ZFS can’t be distributed with the Linux kernel.
There is a third-party kernel module, called
zfsonlinux, which we will be
using. For Arch Linux, there are a few options of getting this kernel
module. The recommended way of getting it is by installing a patched kernel
that includes the module. The advantage is that this is the easiest
solution, and installation/upgrade (which, for all intents and purposes, is
always the same in Arch!) is very fast and not any different to upgrade a
normal kernel. The disadvantage is that I have to wait for the maintainers
to release new versions of the modified kernel. This can sometimes take
months. Since I want to keep this server very up to date (and close to
upstream) for security reasons, I didn’t like this option very much. The
other option would be to use DKMS. This recompiles the module into the
kernel every time there is a kernel update. Although the update takes
slightly longer, this has the advantage of not having to wait for the
maintainers to release a new patched kernel. The DKMS version is installed
with the following command:
pacman -S zfs-dkms linux-headers. After
installing, we can start to configure ZFS.
First we create a pool for our hard drives. Zfs-on-linux recommends using device ids when creating pools that are smaller than 10 devices. The device ids on Arch Linux can be found by showing the contents of /dev/disk/by-id.
lrwxrwxrwx 1 root root 9 Jun 26 13:53 ata-Intenso_SSD_Sata_III_AA000000000000016985 -> ../../sda lrwxrwxrwx 1 root root 10 Jun 26 13:53 ata-Intenso_SSD_Sata_III_AA000000000000016985-part1 -> ../../sda1 lrwxrwxrwx 1 root root 10 Jun 26 13:53 ata-Intenso_SSD_Sata_III_AA000000000000016985-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Jun 26 13:53 ata-Intenso_SSD_Sata_III_AA000000000000016985-part3 -> ../../sda3 lrwxrwxrwx 1 root root 9 Jun 26 13:53 ata-WDC_WD80EFBX-68AZZN0_VRG0M7DK -> ../../sdc lrwxrwxrwx 1 root root 9 Jun 26 13:53 ata-WDC_WD80EFBX-68AZZN0_VRG8G6LK -> ../../sdb lrwxrwxrwx 1 root root 9 Jun 26 13:53 ata-WDC_WD80EFBX-68AZZN0_VRG8ZJNK -> ../../sdd
In my case, this is /dev/sdb, /dev/sdc, and /dev/sdd. Using the device ids has the advantage that Linux changes the “classic” identifiers (sd[a-z]) if the boot order is different. So adding more disks, or just a USB drive, can change these. This is obviously something we want to avoid. So, to create our pool we use the following command:
zpool create -f -m /media/ zfsnas raidz ata-WDC_WD80EFBX-68AZZN0_VRG0M7DK ata-WDC_WD80EFBX-68AZZN0_VRG8G6LK ata-WDC_WD80EFBX-68AZZN0_VRG8ZJNK
If the command is successful, there should be no output. We can check the status of our pool with zpool status. The output should look like this:
pool: zfsnas state: ONLINE config: NAME STATE READ WRITE CKSUM zfsnas ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ata-WDC_WD80EFBX-68AZZN0_VRG0M7DK ONLINE 0 0 0 ata-WDC_WD80EFBX-68AZZN0_VRG8G6LK ONLINE 0 0 0 ata-WDC_WD80EFBX-68AZZN0_VRG8ZJNK ONLINE 0 0 0 errors: No known data errors
So the pools are automatically imported on boot we need to enable 2 systemd services.
systemctl enable zfs-import-cache systemctl enable zfs-import.target
After that, we want to create datasets. These look like folders, but allow us
to use other features, like mounting them under a different path and setting
quotas. We create these datasets with
zfs create <nameofzpool>/<nameofdataset>.
In our case we also want to enable encryption. So the command looks like this
zfs create -o encryption=on -o keyformat=passphrase zfsnas/cryptset
ZFS asks us for a passphrase, which we then need to enter twice. After that we can start to create datasets for our different services. In my case, I have one dataset for my Nextcloud server, one for my SMB NAS, and one for the backups of my different clients. We also set quotas for each dataset.
zfs create zfsnas/cryptset/smb zfs create zfsnas/cryptset/nextcloud zfs create zfsnas/cryptset/backup zfs set quota=8TB zfsnas/cryptset/smb zfs set quota=200GB zfsnas/cryptset/nextcloud zfs set quota=4TB zfsnas/cryptset/backup
After we have prepared the datasets, we transfer everything from the old disk to the new array. I’m doing this with rsync.
After transferring everything, we need to reconfigure our services. The one I was most afraid of was nextcloud. There is this help article that begins with this scary looking disclaimer:
First of all: Changing data directory after installation is not officially supported. Consider re-installing Nextcloud with new data directory, if you did not use it too much/added users/created shares/tags/comments etc.
That doesn’t inspire much confidence. The problem is that NC stores information about all files in its database. So this database to be either manually updated (error-prone) or rebuilt (losing basically all metadata like shares, comments, etc). I decided to do the second one, since I’m not a DBA by any stretch of the imagination, and I didn’t have many shares. If someone gets their data access cut off, they’ll complain anyway. Then I’ll be informed and can create a new share.
The first thing I did was change the location of the data directory. To do
this, we need to change
/etc/webapps/nextcloud/config.php. It contains a
variable called datadirectory. After I changed this variable, I ran
occ maintenance:repair for good measure. To my surprise, this was all that I
needed. All files (and share links) were there. For Samba, I just changed the
config (smb.conf) to the new location.
After that, everything worked. I’m looking forward to the new NAS, and I hope I will lose some of the hassle I had with Btrfs.