So you're starting a new project. You have an idea for something, and a vision for how to implement it. What should you do first?
If you're like me, you burn to start with the interesting coding bit
first. After a few hours of furious typing, you've done the fun bit,
and you need to add all the boring bits: automated tests (unless you
did TDD, and even then there may be missing tests), documentation, a
manual page, a README, perhaps packaging (Debian, RPM, language
specific), a CI project, a home page, etc. There's a lot of it. That
is usually when my enthusiasm fades. A lot of my projects end up as a
single file in ~/bin
.
I call the boring bits scaffolding, and I've learnt that it pays to do them first. Not only is that often the only time I will actually do them, but once the scaffolding is in place, the fun bits are also more fun to do.
If I have unit and integration test frameworks in place, adding another test is only a small incremental task. If I haven't got them in place (particularly for integration tests), I tend to postpone writing tests. No chance of TDD, whereas when I put in an integration test framework, I often to TDD at the program level, not just at the individual class and method level.
Likewise, if I start by writing a manual page, adding another sentence or paragraph is easy. Also, writing a manual page, or any documentation, clarifies my thinking about what the program should actually do, and how it should be used. This makes writing the code more pleasant.
Having CI in place from the beginning, means the tests I write get exercised from the start. This finds bugs earlier, even if I run my tests manually as well. CI never thinks "I can run test faster this one time by taking this clever shortcut".
Of course, putting up a lot of scaffolding takes effort, and it's all wasted if you don't actually want to write the program after all. Sometimes what sounds like a brilliant idea on the last train home from a party, makes no sense at all in the morning. (Personal experience, ahem.)
Thus it may make sense to start with a simple proof of concept, or prototype implementation to verify the soundness of you idea, then throw that away, and set up scaffolding, before you start writing production code.
My latest personal project has a manual page, unit and integration tests, Debian packaging, a CI project, and a home page. I can install it and run it. It does't yet do anything useful. Before all this, I wrote a prototype to prove the idea I had, and threw that away. Some time this year I will start dropping in modules to actually do useful stuff, and that'll be easy, since I won't need to worry about all the boring bits. As soon as it does anything useful, I can also point friends at a Debian package and ask them to try my latest masterpiece.
What's more, I have a better chance of getting them to help by writing a module they need themselves, since they just need to write the module and don't need to worry about the rest.
Previously we lamented that we had to read the cmdline
for every process
to work out whether the program we want to run is already running,
and that we'd like to be able to look up a name to see if it's running.
The traditional way to do this
is to write the process identifier (or PID)
to a file called named after your program,
such as /var/run/$PROGNAME.pid
.
This is an obvious solution, that you will probably have seen before, and before systemd and upstart became popular, was the way you handled running services in Linux.
How people do pid files wrong
On the surface just writing the PID looks like a good idea, since you can use the presence of the file to tell that the process is running or read the contents of the file to see which PID it's using so you can kill(2) the process.
This however is less reliable than parsing files in procfs,
since if you're reading from /proc
you know it's always up to date
since it's provided by the kernel's process table.
There are no such guarantees from the PID file. Your process could terminate abnormally and not clean it up.
Be extra cautious with PID files not stored in /run
or /var/run
,
since there is no guarantee that they weren't left over from a previous boot.
Reliability can be improved
by checking whether the PID from the file is in use,
by running kill(pid, 0) == 0
in C,
or kill-0 $pid
in shell,
since this will evalutate to true if that process is running.
This is not yet reliable though, since while killing with signal 0 will tell you whether a process with that PID is running, you can't tell if it's the correct process, since PIDs get reused fairly frequently.
For PID files to be reliable you need some way to guarantee that the PID in the file is actually referring to the correct process.
You need some way of tying the lifetime of the process to the file.
When a process terminates all the file descriptors it had open are closed, so we can know that the contents are valid if a process is holding it open.
You may be tempted to use lsof-Fp -f -- /var/run/$PROGNAME.pid
,
or parsing procfs manually to determine whether a process is using the file,
but this is awkward for the same reason
as not parsing the output of the ps(1) command to tell whether it's running.
We need to be able to do something to the file descriptor that will have an observable effect through other file descriptors to that flie.
The solution to this is to take a lock on the file.
You may see some programs use fcntl(2) with F_SETLK
.
This is tempting because F_GETLK
will include the PID in the result
instead of requiring the service to serialise and write the PID to the flie
and the checking process having to read and parse the file.
F_SETLK
should be avoided because the lock is removed
when any file descriptor to that file owned by that process is closed,
so you need to be able to guarantee
that neither any of your code not any other code you use via a library
will ever open that PID file again
even if your process normally opens arbitrary files by user input.
So rather than using fcntl(2) with F_SETLK
,
use flock(2) with LOCK_EX
, or fcntl(2) with F_WRLK
to take a write or exclusive lock on the file,
and test whether the contents are live by trying to take a read or shared lock,
with flock(2)'s LOCK_SH
, or fcntl(2)'s F_RDLK
.
If you succeed at taking a read lock then the contents of the file aren't valid and the service isn't running.
Implementing this logic can be awkward, fortunately if you're writing a C program you can use the libbsd-pidfile functions.
It's the wrong tool for the job
Setting it up correctly is tricky
The libbsd-pidfile functions will handle locks correctly, but as permissively licensed as it is, it is not always available, perticularly if you're not writing a C program.
A brief survey of the top results for python libraries for handling PID files resulted in pid, python-pidfile and pidfile.
python-pidfile does not keep the file open or take a lock. pidfile only improves by taking the lock and holding it. pid checks the validity and sets correct permissions on the file. None of them have a mechanism to safely retrieve the PID from the file.
So if you have to do it yourself, you'll need to do the following:
- open with
O_CREAT
to make the lock file if it doesn't exist. - Try to non-blocking take a shared lock on the file. If you fail it's running, and depending on whether you want to end it, replace it or leave it, you should either instruct it to terminate and continue, exit, or terminate then exit. If you succeed then it's not running.
- If you want to start a new service or replace it, upgrade your shared lock to an exclusive lock with a timeout. If you take a blocking lock then your processes will deadlock waiting for their turn to start the service, and if you take a non-blocking lock then if your process is pre-empted by another process between trying to lock and releasing the shared lock, then you could end up with every process exiting.
- If you took the lock replace the contents of the file with your PID, if you timed out unlock the file or exit.
The above is already complicated and still doesn't handle edge-cases, such as another process trying to start between taking the lock and writing the PID, which libbsd-pidfile handles with a retry loop.
It also doesn't handle the file being unlinked while starting, which would cause you to have multiple services running.
libbsd-pidfile doesn't take the shared lock then upgrade, so if many processes often want to know whether it's running, then they would be unnecessarily contesting the lock.
Views of PIDs are not universal
Process namespaces exist.
While a single process' view of its PID does not change, other processes can see it with different PIDs since processes in sub-namespaces are viewable in multiple namespaces and have a different PID in each namespace.
You can detect whether the process is running by whether locking fails, but you can't use the contents to terminate it, since the PID it knows it to be is different from the PID you can reach it with, unless you use a restricted system call to enter the namespace of the process in order to terminate it.
fcntl(2)'s F_GETLK
would return the PID correctly,
but is unreliable in other areas.
Services can span more than one process
The PIDfile model works acceptably if your service only runs in one process, but some programs spawn helper processes and terminating the lead process may not tidy up all the helper processes.
There are ways to have the termination of one process cause the termination of others, however this just moves the problem of "how do you know your process is running" to the supervising process without solving the general problem.
So, we know that PID files aren't necessarily the right tool for the job, in the next article we explore some alternatives.
A while ago I wrote about ensuring that you know why you're writing something in order that you keep focussed on that goal while you code. My focus at that point was on the specific project you were looking to undertake, but it may behoove us to look at the wider picture from time to time.
What motivates you?
It's important that you can answer this question, ideally without hesitation or backtracking. For each of us the answer will be different, and noone's answer is any less "right" than anyone elses. For myself, it took several years to be in a position to answer the question confidently, quickly, and consistently. That's not to say that my answer won't change in the future, but at least for now I know what motivates me and how that manifests in my day-to-day hacking.
I have had a word with Richard and he has explained his motivation to me, and so for your perusal and criticism, here's what motivates us both..
Daniel
For me, the primary motivation for writing free software is that I enjoy making it possible for other people to achieve things. I am, as it were, an "enabler" or "facilitator". This manifests itself in an interest in processes, meta-programming, and tooling. I find myself writing libraries, services, and test tooling; and I enjoy reading papers and architecture designs, thinking of new ways to solve old problems, and novel problems to solve. (And of course, I write articles on this here blog )
Richard
My motivation in general is to learn something such that it can be applied to something which in some way may be construed as to the betterment of society. Or indeed those things which may improve society directly. In the free-software world, this has manifested in the topic of reliability and also freeing people from vendor lock-in.
(* note, I kinda paraphrased what Richard said)
Homework
You didn't think I'd let you get away with no homework this week did you? Hah! I'd like you to sit down, consider your motivation in the free software world and a few ways in which that manifests into projects you work on or with. If you're feeling super-enthusiastic about it, why not post a comment on this post and share your motivation with the rest of us?
We previously discussed issues with using PIDfiles.
One issue we encountered was that we need a way to handle multiple processes.
Process groups
If you've ever started a program in the background in a shell you might have noticed it gave you a "Job ID" to refer to it rather than a process ID.
This is not just to give you a memorable number for each task, but because jobs may contain multiple processes, which is how a pipeline of multiple processes may be a single job.
This is accomplished in Linux and traditional UNIXes with the setpgrp(2) system call which assigns a new process group to a process which will be inherited by its subprocesses.
This entire process group may be killed by passing the negation of the process group ID to the kill(2) system call.
A process may only be part of one process group though, so if you have processes that may call setpgrp(2) themselves then it is not possible to use process groups to manage terminating a whole process tree of a service.
UNIX Sessions
You may be wondering how anything can possibly work if you can't use process groups to track a user's processes.
The answer to this is that UNIX has a concept of sessions.
Every process is a part of a session,
and each session has a "controlling TTY",
which can be accessed via /dev/tty
.
When a process creates a new session with setsid(2) it becomes the session leader.
If the session leader process is terminated
then the entire session receives the SIGHUP
signal,
which by default terminates the process.
The controlling tty was traditionally a virtual terminal which emulates the old teletype terminals on modern computers. Terminal windows in graphical interfaces use pseudo terminals, which could be used to use sessions for grouping processes that don't belong to a device.
This is typically done by getty and login(1), terminal emulator or sshd, which also update utmp(5) to include the controlling TTY and session ID, to track the current active sessions.
There are a number of issues with using UNIX sessions for tracking processes.
utmp(5) is an awkward interface, requiring multiple processes to access a single file without trampling over each other, requiring file range locking, which can't be done portably and in a thread-safe manner.
I consider this to be analogous to /etc/mtab, which was an old, manually maintained file, which had to be replaced with a more reliable, kernel-provided interface.
setsid(2) describes sessions and process groups as a strict two-level hierarchy.
The implication of this is that any process can escape with setsid(2), so bypassing mtab and inspecting the contents of the
sessionid
flie in/proc
won't work.Escaping session cleanup is by necessity a well-documented procedure, since traditional daemons are started by detaching from the current session rather than asking the init process to start the daemon.
See nohup(1) for details about how to escape session cleanup.
Conclusion
The traditional UNIX system calls came from a time when it was believed you could trust programs to be well written and benign.
We do not live in this world, so we need a better approach to track which processes we run on our computers, which we will discuss in a future article.