The Technology Behind the WoltLab Cloud: ZFS

Photo by Taylor Vick on Unsplash

A lot of time has passed since we first came up with the idea of our own managed hosting platform until we went live, in particular time that we spent on careful planning. Our stated goal was to create a reliable and powerful platform to meet the high demands of our customers. By deploying 100% of our servers using the ZFS file system, we laid the foundation for the success of WoltLab Cloud.

In this first article of a planned series on the WoltLab Cloud, we would like to provide an insight into our planning and real-world experiences and explain why the use of ZFS was and is essential for us.

Proven Solution in the Enterprise Space

The choice of file system often seems trivial and insignificant, yet it can be critical to smooth operations. The loss of some files or system crashes are annoying in the case of a private system, but a server in continuous operation is simply expected to be stable all the time.

The development of ZFS began in 2001 by Sun Microsystems for use on their in-house enterprise operating system Solaris, then was integrated into OpenSolaris under an open source license and subsequently ported to various operating systems. In FreeBSD it is natively integrated, for use on Linux it must be loaded as an external module. Initially, the Linux port "ZFS on Linux" was made for use on supercomputers and is nowadays in use with various new features as OpenZFS for example for the operation of the certification authority "Let's Encrypt".

Over the years, we have gradually integrated ZFS into our operations. The built-in snapshots allow consistent and notably fast restoring of files to a previous state and saved us a tremendous amount of time in the early days when developing importers. Over the course of the following years, ZFS made its way into our demo offering, providing us with valuable experience for later use in the WoltLab Cloud.

Data Integrity Is More than Meets the Eye

Since the early days of the Internet, information sharing has been a primary goal, and forums have emerged over decades as a capable platform for the exchange and preservation of knowledge. The information stored in this way is still available for access many years later; forums can also be described to some extent as a modern form of library. The preservation of information is of fundamental importance; a loss of data is not only annoying, but can also shake the confidence of users. Such a loss of painstakingly compiled and maintained content quickly drives users away from a site.

Most Data Backups Fall Short

Regular data backups are a proven solution to obvious cases of data loss, such as the failure of disk drives or other serious failures. Less obvious, but just as fatal, are hidden data losses that remain undetected over a longer period of time and insidiously destroy the information. We are talking about "bit rot" or "data degradation", where the smallest errors occur, often with only a single bit being stored incorrectly. The effects can be dramatic, especially when the data is compressed. This is very clear in the case of images, for example: A single erroneous bit is enough to significantly distort the graphic or render it completely unusable. The tricky part about such errors is that they are often only noticed very late and there is no longer any data backup that does not contain this error.

The ZFS file system starts exactly here and stores separate checksums which are compared when reading the data. In this way, such hidden errors can be immediately detected and automatically corrected by replacing the faulty data with an intact copy. As an additional defense against creeping data loss, all data is proactively checked for integrity at regular intervals ("ZFS Scrub"), so that even rarely accessed data can be protected from damage in the long term.

Secure Write Operations Through "Copy on Write"

Another feature of ZFS is its "copy on write" approach to making changes to data securely. Instead of overwriting parts of data with a new version, a new copy of this new data is stored instead, and then the old version is discarded. This approach achieves a very high level of data security because changes to a file are either complete or not written at all. A system crash or power failure with ZFS, unlike other file systems, does not lead to broken files that have already been partially overwritten with new data.

Excursus: The Long Journey to "Pooled Storage"

It is common practice to use separate partitions and file systems for individual directories. For example, it is a good idea to store log files in a separate, size-limited partition in order to prevent the logs from filling up the entire storage space in the event of an error and thus affecting the stability of the entire system. Likewise this technology features so-called "Mountpoints" to implement further restrictions, for example with noexec the execution of programs from this memory area can be prevented in principle.

The Rise and Fall of the LVM

Over the course of time, the shortcomings of partitions became increasingly apparent. The quantity, size and position of the partitions on the physical data media can only be adjusted with a great deal of effort and is accompanied by potential downtimes due to necessary system reboots. In addition, a partition can only be as large as the capacity of the physical disk on which it is stored.

The problem was partially solved with the "Logical Volume Manager" ("LVM"), in which dynamic virtual partitions are used, which can be physically distributed over several drives. However, the problem of the fixed capacity of the partition was also only partially solved with LVM, a subsequent downsizing of partitions is still a nightmare. With techniques like "Thin Provisioning" (over-allocation of storage space) the problem could be circumvented partially, but in practice it created new and much more serious error sources.

ZFS Addresses the Root of the Problem

Separating the file system from the management of physical disks is the root cause of the unsolvable dilemma of classical partitioning. ZFS approaches the problem from a new direction, merging both tasks onto itself to resolve the friction between file system and volume management.

In ZFS, disks are grouped into groups called "zpool" that act as a unit with each other. Data is organized into "datasets", which conceptually can be viewed as partitions with flexible sizes. The available space in a dataset is limited only by the shared capacity of the zpool. When data is stored, the free shared space is reduced, and when it is deleted, it is released again, regardless of the dataset in which this took place.

In the event of a projected storage shortage, additional volumes can be added on the fly and will be immediately available to all datasets. Datasets still have the usual control mechanisms for limiting storage usage and, conversely, it is also possible to reserve specific quotas that are guaranteed to be available to a dataset.

The Advantages of ZFS for the WoltLab Cloud

The general advantages of ZFS listed up to this point are not of a theoretical nature, rather they have proven themselves in our practical use and have thus brought us and our customers many advantages. We would therefore like to take a closer look at how we use ZFS and build on its capabilities.

Data Redundancy and Security

All of our servers run the Debian GNU/Linux operating system and rely on "Root on ZFS", the only exception being the /boot partition based on ext4. The systems each use a single zpool ("rpool"), depending on the purpose with two (as mirror-vdev) or six (as raidz2) volumes in the array. Each server thus has local redundancy; in addition, disk drives from different manufacturers are used in each server in order to minimize fatal failures due to production errors.

Adapted to the software components in use, we employ individual datasets in order to be able to limit storage consumption as well as to be able to carry out further optimizations. A good example of this are the datasets for MySQL, which use reduced ZFS block sizes (recordsize) to match MySQL's internal block sizes. This approach provides better performance and reduces unnecessary wear on the SSDs used ("write amplification").

Management of Customer Data During Operation

Within our internal network, we run a custom management software (internally just called "Manager") that automates the setup of new customers. A key component of this is the use of separate datasets per customer, which are created underneath existing container class datasets. This nesting allows us to enforce standard security measures, such as preventing programs from running in data stores, from a central location, thus creating as many lines of defense against attacks as possible.

The use of separate datasets gives us another advantage, in addition to the storage space management already mentioned: the ease of handling all data as a single logical unit. In the simplest case, it is just a matter of importing a backup without spending a lot of time; at the same time, we can also use the same mechanisms to quickly and consistently move the dataset to another system ("zfs send"). Instead of implementing complex and error-prone mechanisms on our own, we looked at the possibilities of ZFS from the very beginning in order to build our entire concept on it.

Once a month, all data is additionally checked for integrity with the help of the mentioned "ZFS Scrub". In combination with the backups kept for up to 6 weeks per customer, we can actively detect and correct errors before it is too late.

Conclusion: For us, ZFS is not just a way to store data, but represents an integral part of our overall operational concept. We actively take advantage of ZFS to be able to guarantee stable and high-performance operation of the WoltLab Cloud.