Useful, secure, finished. Pick two.

I've just spent a long time writing about how systemd is the solution to all your process management reliability woes.

As with everything though, as I've alluded to in the subtitle, there are trade-offs.

What's the catch?

It is arguable that increasing the responsibilities for init, historically a very simple daemon, is a dangerous thing to do.

I believe these changes have been warranted, since the traditional UNIX process model assumes processes are well-written and benign.

Security updates

To accommodate the changing world, init is now sufficiently complicated that it requires security updates.

This is a problem because you can only have one init process, so you can't just kill the old version and start a new one.

systemd has to work around this by re-executing /sbin/init when it, or any of its dependent libraries, have been updated.

This mechanism should not be relied upon, since it can fail and if it does fail recovery requires a reboot, so if you need to be prepared to reboot on update, why not just reboot the system when an update is required?

Rebooting woes

Rebooting is also further complicated by init being extended.

If a library that a process depends on is removed as part of an update then the running process may keep a copy of it open until the process re-executes or terminates.

This means file systems will refuse to be remounted as read-only until the process stops using certain files. This is hugely problematic if the filesystem is the root file system and the process is init, since init will want to remount the file system before terminating and the file system will want init to terminate before remounting.

Previously the approach would be to shut-down without remounting the file system read-only, but this doesn't cleanly unmount the file system so was a source of file system corruption.

The solution to this employed by systemd is for the init process to execute a systemd-shutdown binary.

So why not move the complicated bits out of PID 1?

PID 1 is complex, and this is a problem. Therefore either systemd's developers don't consider the problems important or there are good reasons why it can't be otherwise.

So, what responsibilities does PID 1 have, and why do they have to be in PID 1?

Process reaping

When a process terminates before reaping its child subprocesses, all those subprocesses are adopted by PID 1, which is then responsible for reaping them.

PR_SET_CHILD_SUBREAPER was added to prctl(2) which allows a different process subreaper in the process hierarchy, so that gets to reap orphaned subprocesses instead of PID 1.

However PID 1 still neads to be able to reap subreapers, so PID 1 needs the same reaping logic, and both implementations need to be either shared or maintained, at which point it's less difficult to just rely on PID 1 doing it.

Traditional init systems perform this function, so it is not controversial for systemd to perform this.

Spawning processes

There are no special requirements necessary to spawn subprocesses, so a separate process could be started to spawn subprocesses.

Unfortunately this has the same bootstrapping problem, where PID 1 needs the same logic for starting its helpers as needs to be used for arbitrary code in the rest of the system.

Traditional init systems perform this function, so it is not controversial for systemd to perform this.

Managing cgroups

Because processes can't be trusted to not escape, cgroups are required to contain them.

A single process is required to manage them.

If services started by init are to be contained by cgroups, then the cgroup management service must either be the init process or must be started by the init process and have special logic to contain itself first.

This is tractable, but if it's a separate process, then some form of IPC is required, which adds extra latency, complexity and points of failure.

A similar concern exists in the form of journald, which is a separate service that systemd needs to communicate with to get it to log the output of new services to a file.

This complexity already causes systemd trouble, as a crashing journald can bring the whole system to a halt, so similar complications should be avoided.

Communicating via DBus

The init process needs some form of IPC to instruct it to do things.

Historically this was just telinit writing to the /dev/initctl FIFO, so was a pretty trivial form of IPC.

However we've established that init now requires new responsibilities, so requires a much richer form of IPC.

Rather than inventing some bespoke IPC mechanism, systemd uses DBus.

systemd also participates in the system bus, once the DBus daemon has been started, which adds extra complexity since the DBus daemon is started by systemd.

This is handled by systemd also handling point-to-point DBus, though attempts have been made to move DBus into the kernel in the form of AF_BUS, kdbus and most recently bus1, and there has also been discussion of whether systemd should be a DBus daemon to break this circular dependency.

Summary

The traditional UNIX process model wasn't designed to support a complex init, because it assumed that programs would be benign and well written.

Because you can't trust processes to clean up after themselves properly you need to make init more complicated to cope with it.

Because init is complicated it needs to be able to be updated.

Because the UNIX process model doesn't have a way to safely replace init you have to allow for it failing and needing a reboot, so you can't safely perform live updates.

Alternative ways of structuring init would make it even more complex so more opportunity for things to go wrong.