We previously discussed the traditional UNIX mechanisms for service management, and how they assumed benign and well written software.

Fortunately Linux provides more than just traditional UNIX system calls, so offers some features that can be used to track processes more completely.

Intercepting processes with ptrace(2)

If you could run some code when a process creates a subprocess or exits then you could use this to track which processes are active and where they came from.

Debuggers like gdb(1) also need to know this information since you might want to set a breakpoint for subprocesses too.

So it would be possible to do this using the same mechanism as debuggers.

This is what Upstart does to work out which process to track for double-forking daemons.

Unfortunately a process cannot be traced by multiple processes, so if Upstart is tracing a process to track its subprocesses then a debugger cannot be attached to the process.

For Upstart it detaches the debugger after it has worked out the main PID, so it's a small window where it is undebuggable, so it's only a problem for debugging faults during startup, but detaching after the double-fork means it can't trace any further subprocesses.

Continuing to trace subprocesses adds a noticeable performance impact though, so it's for the best that it stops tracing after the double-fork.

Store process in a cgroup

cgroups are a Linux virtual filesystem that lets you create hierarchies to organise processes, and apply resource controls at each level.

cgroups were created to handle the deficiency of traditional UNIX resource control system calls such as setrlimit(2), which only apply to a single process and can be thwarted by creating subprocesses, since while a process inherits limits of its parent process it does not share them with it.

Subprocesses of a process in a cgroup on the other hand are part of the same cgroup and share the same resource limits.

In each cgroup directory there is a cgroup.procs virtual file, which lists the process IDs of every process in the cgroup, making it effectively a kernel-maintained PIDfile.

This is what systemd uses for its services, and you can request a cgroup for your own processes by asking systemd (via systemd-run(1) or the DBus interface) or cgmanager (via cgm(1) or the DBus interface) to do so on your behalf.

Why can't I mount my own cgroupfs?

Unfortunately you can only safely have 1 process using a cgroup tree at a time, and you can only have one cgroupfs mounted at a time, so you always need to ask some daemon to manage cgroups on your behalf.

See Changes coming for systemd and control groups for why a single writer and a single hierarchy are required.

Conclusion

It is necessary to track all the subprocesses of a service somehow, using ptrace(2) prevents it being used for debugging, cgroups are an interface designed for this purpose but technical limitations mean you need to ask another service to do it.

So I would recommend writing a systemd service if your processes are a per-system or per-user service, or to use the DBus API to create cgroups if not.

Thus cgroups allow us to know our processes are running, and currently the best way to use cgroups is via systemd. The implications of relying on systemd to do this are best served as a subject of another article.

If you are interested in learning more about cgroups, I recommend reading Neil Brown's excellent series on LWN .