Daniel Silverstone

…or how to take a bad analogy far too far; without any goal in sight.

Sometimes you just need a break from all the bits and bytes; all the integers and floats and strings. Sometimes you just want to step away from the keyboard and do something completely different, such as bake a cake. There's nothing so different from writing software as baking a cake, right?

Right?

…

Wrong!

When you cook anything, you're essentially executing a program. If you're the sort of person who follows the recipe slavishly then you're executing a defined algorithm for turning ingredients into something else; if you're the sort of person who thinks recipes are essentially a guideline, then you're still running a program in your head, but it's more of a heuristic than an algorithm.

In the process of running this program, you use tools, process inputs, and produce outputs. You even likely have (albeit small, simplistic, and likely poorly defined) a test suite to ensure that you produce the right outputs (that of simply tasting things).

What's even worse (assuming you were trying to get away from producing software rather than simply wanting dinner) as you go, even if you weren't expecting to, you likely are debugging and bugfixing your program as you learn more about how you want to transform inputs to outputs effectively and efficiently.

Okay, so if programming and cooking are just more of the same, then why this article at all?

In brief, everyone needs something to do when the code just doesn't flow. I sometimes switch to reading a book, watching videos online, or going for a walk; but sometimes cooking is exactly the right thing to do. At least a few other programmers must agree since there's any number of programmer oriented cooking-related tools out there, a yaml syntax or three for defining recipes, and some desktop environments which shall remain unnamed even consider recipe tools to be worth mentioning in their release notes :-)

All this applies to those of us who slavishly try and write articles for you to enjoy every week. As such, sometimes we just need to have fun. Thus, your homework this week, assuming you're not allergic to any of the following, is to:

Take two eggs and weigh them (shell on)
Sift the same mass of self-raising flour and set aside.
Melt the same mass of butter and beat the eggs (shell off), butter, and the same mass of sugar together.
Gently fold the flour into the wet mix, trying to not knock the air back out of it. (Or just use an electric whisk to keep it floofy)
Divide the mix between two pre-greased and floured cake tins, and bake in the centre of a pre-heated oven at (approximately) 170°C for 20-25 minutes (check the centre with a skewer)
Allow the cakes to stand for 10 minutes before turning out onto a wire rack to cool thoroughly.
Beat 150g to 250g of butter until smooth and soft (I prefer slightly salted butter, but unsalted is fine too)
Mix icing sugar into the butter, a spoonful at a time, beating it until smooth each time. Repeat until the icing tastes sweet rather than savoury.
Spread a small bit of icing on the centre of a plate, and use that to stick one of your sponges down. Ice the top of the sponge and then pop the second layer on top. If you have enough icing left, smear it all over the cake, remembering that presentation is far less important than deliciousness.

Now serve slices of your delicious cake with cups of delicious tea or coffee.

Also, please do share your favourite recipes in the comments below, before you take your coffee and cake and get on with coding up your latest masterpiece.

Richard Maw

We previously discussed the traditional UNIX mechanisms for service management, and how they assumed benign and well written software.

Fortunately Linux provides more than just traditional UNIX system calls, so offers some features that can be used to track processes more completely.

Intercepting processes with ptrace(2)

If you could run some code when a process creates a subprocess or exits then you could use this to track which processes are active and where they came from.

Debuggers like gdb(1) also need to know this information since you might want to set a breakpoint for subprocesses too.

So it would be possible to do this using the same mechanism as debuggers.

This is what Upstart does to work out which process to track for double-forking daemons.

Unfortunately a process cannot be traced by multiple processes, so if Upstart is tracing a process to track its subprocesses then a debugger cannot be attached to the process.

For Upstart it detaches the debugger after it has worked out the main PID, so it's a small window where it is undebuggable, so it's only a problem for debugging faults during startup, but detaching after the double-fork means it can't trace any further subprocesses.

Continuing to trace subprocesses adds a noticeable performance impact though, so it's for the best that it stops tracing after the double-fork.

Store process in a cgroup

cgroups are a Linux virtual filesystem that lets you create hierarchies to organise processes, and apply resource controls at each level.

cgroups were created to handle the deficiency of traditional UNIX resource control system calls such as setrlimit(2), which only apply to a single process and can be thwarted by creating subprocesses, since while a process inherits limits of its parent process it does not share them with it.

Subprocesses of a process in a cgroup on the other hand are part of the same cgroup and share the same resource limits.

In each cgroup directory there is a cgroup.procs virtual file, which lists the process IDs of every process in the cgroup, making it effectively a kernel-maintained PIDfile.

This is what systemd uses for its services, and you can request a cgroup for your own processes by asking systemd (via systemd-run(1) or the DBus interface) or cgmanager (via cgm(1) or the DBus interface) to do so on your behalf.

Why can't I mount my own cgroupfs?

Unfortunately you can only safely have 1 process using a cgroup tree at a time, and you can only have one cgroupfs mounted at a time, so you always need to ask some daemon to manage cgroups on your behalf.

See Changes coming for systemd and control groups for why a single writer and a single hierarchy are required.

Conclusion

It is necessary to track all the subprocesses of a service somehow, using ptrace(2) prevents it being used for debugging, cgroups are an interface designed for this purpose but technical limitations mean you need to ask another service to do it.

So I would recommend writing a systemd service if your processes are a per-system or per-user service, or to use the DBus API to create cgroups if not.

Thus cgroups allow us to know our processes are running, and currently the best way to use cgroups is via systemd. The implications of relying on systemd to do this are best served as a subject of another article.

If you are interested in learning more about cgroups, I recommend reading Neil Brown's excellent series on LWN .

Daniel Silverstone

Recently I spent time with our very own Lars Wirzenius, in Helsinki. While there, I had the distinct pleasure of standing in front of a whiteboard with Lars and hashing out aspects of two different Free Software projects which we are both involved with. We have talked about irc in the past and we've talked about being gracious when dealing with your community. In general we've only really spoken online interactions with your project (excepting when we spoke of conferences) but today I'd like to introduce you to the inestimable value of whiteboarding with your collaborators.

While the value of having written discussion is very high, a log of what was said by whom and in response to what is very helpful, and nicely written design documentation can be critical to a project's success; the incredible bandwidth of face-to-face discussions is hard to beat. Conferences are, as previously discussed, one of the ways to achieve this. But for the raw base design work for a project, whiteboarding on your feet (or your rear-end I suppose) is one of the best ways I find to get the ideas flowing.

Whiteboarding is best done in a small group. Ideally two to four people. You need to have a room to yourselves because you will be making a lot of noise; talking over each other as the ideas pop up; and generally making a lot of mess on the whiteboard. If you can have multiple whiteboards in one room then larger projects can be easier to deal with. I have successfully whiteboarded with a fold-up whiteboard on a café table, though that was a lot of work.

The second thing that you really must have for whiteboarding is a camera. You might get lucky and have a "smart" whiteboard which you can extract an image from, though I generally find those to be less effective than a quick snap from a camera phone before wiping the whiteboard down.

Don't be afraid to write stuff up on the board which you're going to immediately say is wrong, because seeing it written up can spark ideas in others. Whiteboards give you the ability to sketch out diagrams as well as APIs with equal ease - I find that sometimes the best way to work out how a state machine works is to draw it up, despite the fact that I am almost entirely incapable of thinking in images at all.

Once you've done your design work and come to agreements, ensure that photographs of your whiteboards are circulated among the attendees. At this point the most important step of all happens -- you all should write up what was discussed and the decisions which were arrived at. Then you can all review each others' memories of what happened and come to a consensus as to the final decisions as a group. Interpreting the whiteboards after the fact gives the attendees further chance to jog an idea which may have been fermenting in their minds since the session.

I hope that you'll take this on board and try out whiteboarding with one of your projects just as soon as you can. However, since it can be hard to just magic up a whiteboard, I won't set you homework this week. Enjoy your respite.

Jon Dowland

Today's posting is by a guest author, Jon Dowland.

I wanted to refresh my Haskell skills recently so I picked up an old idea I'd had of writing a successor to WadC in Haskell. WadC is a LOGO-like language for making 2D drawings (Doom maps). You write a sequence of instructions that move a "pen" around and draw lines, place things, etc. It looks a bit like this:

...
movestep(32,32)
thing
straight(256)
rotright
straight(512)
rotright
...

The driving force behind for writing a successor to WadC is to work around one of its most serious limitations: you do not have direct access to the data structure that you are building. Once you place a line, it's defined and you can't change it, or even query existing drawn lines to find out if it exists. The appeal of using Haskell would be to give you access to the data structure that's "behind the scenes" as well as Haskell's rich features and libraries.

WadC's syntax, like LOGO, resembles an ordered sequence of operations that manipulate some hidden state relating to the Pen: its orientation, whether its on the "paper" or not, what colour line is being drawn, etc. I thought this would look particularly nice in Haskell's "do notation" syntax, e.g.:

blah = do
    down
    straight 64
    rotright
    straight 128
    rotright
...

Ignoring the syntax issue for a minute and thinking just about types, the most natural type to me for the atomic operations would be something like Context -> Context: simply put, each function would receive the current state of the world, and return a new state. For example Context could be something like

data Context = Context { location :: Point          -- (x,y)
                       , orientation :: Orientation -- N/E/S/W
                       , linedefs :: [Line]         -- Line = from/to (x,y)/(x,y)
                       , penstate :: PenState       -- up or down
                       ...

An example of a simple low-level function that would work with this:

    down c = c { penstate = Down }

The immediate advantage here over WadC is that the functions have access to all of the existing context: they could not only append lines, say, but replace any existing lines too. I was envisaging that you might, for example, adjust all drawn lines as if you were using a completely different geometric model to get some weird effects (such as one axis being warped towards a single point). Or perhaps you would super-impose two separate drawings and have fine control over how overlapping regions have their properties combined (one overrides the other, or blend them, etc.)

The problem is in uniting the simple types with using do-notation, which requires the routines to be "wrapped up" in Monads. I got a prototype working by writing a custom Monad, but it required me to either modify the type of each of the low-level functions, or wrap each invocation of them inside the do-block. If you imagine the wrap function is called wadL, that means either

down = wadL down' where
    down' c = c { penstate = Down }

blah = do
    wadL $ down
    wadL $ straight 64
    wadL $ rotright

Both approaches are pretty ugly. The former gets ugly fast when the functions concerned are more complicated in the first place; the latter pretty much throws away the niceness of using do-notation at all.

An alternative solution is one that the Diagrams package uses: define a new infix operator which is just function composition (.) backwards:

down        &
straight 64 &
rotright

(Diagrams uses # but I prefer &)

By playing around with this I've achieved my original goal of refreshing my Haskell skills. Unfortunately I don't have the time or inclination to continue with this side-project right now so I haven't published a working Haskell-powered Doom map editor.

If I return to this again, I might explore writing a completely distinct language (like WadC is) with the compiler/interpreter likely written using Haskell.

Richard Maw

Useful, secure, finished. Pick two.

I've just spent a long time writing about how systemd is the solution to all your process management reliability woes.

As with everything though, as I've alluded to in the subtitle, there are trade-offs.

What's the catch?

It is arguable that increasing the responsibilities for init, historically a very simple daemon, is a dangerous thing to do.

I believe these changes have been warranted, since the traditional UNIX process model assumes processes are well-written and benign.

Security updates

To accommodate the changing world, init is now sufficiently complicated that it requires security updates.

This is a problem because you can only have one init process, so you can't just kill the old version and start a new one.

systemd has to work around this by re-executing /sbin/init when it, or any of its dependent libraries, have been updated.

This mechanism should not be relied upon, since it can fail and if it does fail recovery requires a reboot, so if you need to be prepared to reboot on update, why not just reboot the system when an update is required?

Rebooting woes

Rebooting is also further complicated by init being extended.

If a library that a process depends on is removed as part of an update then the running process may keep a copy of it open until the process re-executes or terminates.

This means file systems will refuse to be remounted as read-only until the process stops using certain files. This is hugely problematic if the filesystem is the root file system and the process is init, since init will want to remount the file system before terminating and the file system will want init to terminate before remounting.

Previously the approach would be to shut-down without remounting the file system read-only, but this doesn't cleanly unmount the file system so was a source of file system corruption.

The solution to this employed by systemd is for the init process to execute a systemd-shutdown binary.

So why not move the complicated bits out of PID 1?

PID 1 is complex, and this is a problem. Therefore either systemd's developers don't consider the problems important or there are good reasons why it can't be otherwise.

So, what responsibilities does PID 1 have, and why do they have to be in PID 1?

Process reaping

When a process terminates before reaping its child subprocesses, all those subprocesses are adopted by PID 1, which is then responsible for reaping them.

PR_SET_CHILD_SUBREAPER was added to prctl(2) which allows a different process subreaper in the process hierarchy, so that gets to reap orphaned subprocesses instead of PID 1.

However PID 1 still neads to be able to reap subreapers, so PID 1 needs the same reaping logic, and both implementations need to be either shared or maintained, at which point it's less difficult to just rely on PID 1 doing it.

Traditional init systems perform this function, so it is not controversial for systemd to perform this.

Spawning processes

There are no special requirements necessary to spawn subprocesses, so a separate process could be started to spawn subprocesses.

Unfortunately this has the same bootstrapping problem, where PID 1 needs the same logic for starting its helpers as needs to be used for arbitrary code in the rest of the system.

Traditional init systems perform this function, so it is not controversial for systemd to perform this.

Managing cgroups

Because processes can't be trusted to not escape, cgroups are required to contain them.

A single process is required to manage them.

If services started by init are to be contained by cgroups, then the cgroup management service must either be the init process or must be started by the init process and have special logic to contain itself first.

This is tractable, but if it's a separate process, then some form of IPC is required, which adds extra latency, complexity and points of failure.

A similar concern exists in the form of journald, which is a separate service that systemd needs to communicate with to get it to log the output of new services to a file.

This complexity already causes systemd trouble, as a crashing journald can bring the whole system to a halt, so similar complications should be avoided.

Communicating via DBus

The init process needs some form of IPC to instruct it to do things.

Historically this was just telinit writing to the /dev/initctl FIFO, so was a pretty trivial form of IPC.

However we've established that init now requires new responsibilities, so requires a much richer form of IPC.

Rather than inventing some bespoke IPC mechanism, systemd uses DBus.

systemd also participates in the system bus, once the DBus daemon has been started, which adds extra complexity since the DBus daemon is started by systemd.

This is handled by systemd also handling point-to-point DBus, though attempts have been made to move DBus into the kernel in the form of AF_BUS, kdbus and most recently bus1, and there has also been discussion of whether systemd should be a DBus daemon to break this circular dependency.

Summary

The traditional UNIX process model wasn't designed to support a complex init, because it assumed that programs would be benign and well written.

Because you can't trust processes to clean up after themselves properly you need to make init more complicated to cope with it.

Because init is complicated it needs to be able to be updated.

Because the UNIX process model doesn't have a way to safely replace init you have to allow for it failing and needing a reboot, so you can't safely perform live updates.

Alternative ways of structuring init would make it even more complex so more opportunity for things to go wrong.

←	May 2017					→
S	M	T	W	T	F	S
	1	2	3 All work and no play makes Daniel a dull boy…	4	5	6
7	8	9	10 Is your process running 4 - Linux-specific approaches	11	12	13
14	15	16	17 Whiteboarding your project	18	19	20
21	22	23	24 A WadC successor in Haskell?	25	26	27
28	29	30	31 Complications arising from having a complex init