Sometimes it is necessary to leave a process running, performing some service in the background while doing something else.

It would be redundant and possibly harmful to start a new one if it is already running.

Ideally all programs would safely shut themselves down if already running, checking if it's running before starting only guarantees that it was runing when you checked, rather than that it is running when you need it. For most purposes though it is reasonable to check first.

So how do we know if our service is running?

You may have run ps(1) before to see if a process is running, so you might naturally think this would be how to do it.

This would of course fall into the trap of parsing the output of shell commands. Why should we write fragile code when ps(1) is using a proper API to do it?

The way this is accomplished is the procfs virtual file system traditionally mounted at /proc. There is a subdirectory in this file system for each process listed by its process ID.

We can list all directories that are processes by running:

find /proc -mindepth 1 -maxdepth 1 -name '[0-9]*'

Inside each of these directories are files describing the process.

Check comm

When you look at the output of ps it shows the name of the process, which is normally the base name of the file path of the executable that the process was started with.

This is stored in the file in /proc called comm.

So if the name of your program is "myprogram", you can find out if your program is running with the following command:

find /proc -mindepth 1 -maxdepth 1 ! -name '*[^0-9]*' -type d -exec sh -c \
    '[ "$(cat "$1/comm")" = myprogram ] && echo Is running' - {} ';'

I would recommend against checking if your program is running this way though, as processes may call themselves whatever they want, by writing the new name to comm.

$ cat /proc/$$/comm
$ printf dash >/proc/$$/comm
$ cat /proc/$$/comm

This is often used by services that fork off helper processes to name the subprocesses after their role to make it easier for developers or sysadmins to know what they do.

Check exe

The procfs entry also includes the path of the executable the proccess was started from as a symbolic link.

Thus if your program is installed at /usr/bin/myprogram then we can check whether it is running with:

find /proc -mindepth 1 -maxdepth 1 ! -name '*[^0-9]*' -type d -exec sh -c \
    '[ "$(readink "$1/exe")" = /usr/bin/myprogram ] && echo Is running' - {} ';'

This cannot be modified by the proces after it has started, but as usual caveats apply:

  1. Not all processes have an initial executable. This symbolic link may be unreadable (fails with errno of ENOENT) in the case of kernel threads.

  2. It could be a program that has subcommands, one of which may be a long-running service (e.g. git-daemon), which you wouldn't want to fail to start just because a shorter operation with a different subcommand happened to be running at the same time.

  3. This is unhelpful in the case of interpreted languages, since it is always the name of the interpreter rather than the name of the script.

  4. The same program may be reachable by multiple file paths if the executable has been hard-linked.

  5. If the program's executable may be removed while it is running, changing exe to append " (deleted)" to the file path.

    If this file is then replaced then another process may have the same executable path but an incompatible behaviour.

    This isn't even unusual if the name of the process is generic, like "sh" or "httpd".

So it's useless for interpreted programs and unreliable if the executable can be replaced.

Check cmdline

It could be perfectly safe to run the same program multiple times provided it is passed different configuration.

The cmdline file can be parsed to infer this configuration as a list of strings that are NUL terminated.

A problem with this approach is the need to reimplement parsing logic and know for all command-lines whether it's appropriate to start another.

This logic could be quite difficult, but you could add a parameter just for determining whether it is the same.

This is far from ideal because:

  1. Lookup time gets worse as your system has more processes running.
  2. Processes can modify their command-line too, so another process could arrange to have the same command-line, and make this unreliable.

Next time we are going to look at a better use for that parameter.

Posted Wed Mar 22 12:00:08 2017 Tags:

Bit rot, specifically the phenomenon of software working less well even if it hasn't changed, is annoying, but a fact of life. There might be a thing or two you can do to make it happen less.

Examples from your author's personal experience from the past year:

  • Cloud provider changes the default username on base images from ec2-user to debian, requiring simple changes needed in many places.
  • Cloud provider upgrades their virtualisation platform, which introduces a new API version, and breaks the old version. All API using automation needs upgrading.
  • Configuration management software introduces a new feature (become), and deprecates the old corresponding feature (sudo). Simple changes, but in many places.
  • Configuration management software breaks the new feature (can no longer switch to an unprivileged user to run shell script snippets), requiring more complicated changes in several places (run shell as root, invoke sudo explicitly).
  • Author's software depends on enterprise-grade software for a specific service, which switches to requiring Oracle Java, instead of OpenJDK. Author's software isn't fully free software anymore.

Bit rot happens for various reasons. The most common reason is that the environment changes. For example, software that communicates over the network may cease to function satisfactorily if the other computers change. A common example is the web browser: even though your computer works just as well as before, in isolation, web sites use new features of HTML, CSS, and JavaScript, not to mention media formats, and web pages become bigger, and in general everything becomes heavier. Also, as your browser version ages, sites stop caring about testing with it, and start doing things that expose bugs in your version. Your web experience becomes worse every year. Your browser bit rots.

There is no way to prevent bit rot. It is a constant that everything is variable. However, you can reduce it by avoiding common pitfalls. For example, avoid dependencies that are likely to change, particularly in ways that will break your software. An HTML parsing library will necessarily change, but that shouldn't break your software if the library provdes a stable API. If the library adds support for a new syntactic construction in HTML, your program should continue to work as before.

You should be as explicit as possible in what you expect from the environment. Aim to use standard protocols and interfaces. Use standard POSIX system calls, when possible, instead of experimental Linux-specific ones from out-of-tree development branches. Sometimes that isn't possible: document that clearly.

Have automated ways of testing that your software works, preferably tests that can be run against an installed instance. Run those tests from time to time. This will let you and your users notice earlier that something's broken.

Posted Wed Mar 15 12:00:07 2017 Tags:

I have the dubious honour of being one of the people, at my place of work, charged with interviewing technical applicants. Without giving the game away too much, I thought I might give a few hints for things I look for in a CV, and in the wider world, when considering and interviewing a candidate.

First a little context - I tend to interview candidates who are applying for higher-level technical roles in the company, and I have a particular focus on those who claim on their CV to have a lot of experience. I start by reading the cover letter and CV looking for hints of F/LOSS projects the applicant has worked with; either as a user or a developer. I like it when an applicant provides a bitbucket, github or gitlab URL for their personal work if they have any; but I really like it when they provide a URL for their own Git server (as you might imagine).

Once I have identified places on the Internet where I might find someone, I look to dig out their internet ghosts and find out what they are up to in the wider F/LOSS world. The best candidates show up in plenty of places, are easily found making nice commits which show their capability, and seem well spoken on mailing lists, fora, et al. Of course, if someone doesn't show up on Internet searches then that doesn't count against them because to have the privilege of being able to work on F/LOSS is not something afforded to all; but if you do show up and you look awful it will count against you.

Also remember, there's more ways to contribute than writing code. I love it when I find candidates have made positive contributions to projects outside of just coding for them. Help a project's documentation, or be part of mentoring or guide groups, and I'll likely be very pleased to talk with you.

Beyond the Internet Stalking, I like to get my candidates to demonstrate an ability to compare and contrast technologies; so a good way to get on my good side is to mention two similar but conflicting capabilities (such as Subversion and Git) be prepared to express a preference between them, and be able to defend that preference.

Finally a few basic tips -- don't lie, dissemble, or over-inflate in your CV or cover-letter (I will likely find out) and don't let your cover letter be more than a single side of A4, nor your CV more than 2 sides of A4.

If I ever interview you, and I find out you read this article, I will be most pleased indeed. (Assuming you take on my recommendations at least :-) )

Posted Wed Mar 8 12:00:07 2017

FOSS projects are mostly developed on a volunteer basis.

This makes the currencies by which they are developed: free time and motivation.

Often times you have the free time, but not the motivation. Often this is not from feeling that the work isn't worth doing, but that you feel inadequate to do it.

Don't be disheartened. There's plenty you can do that helps.

  1. Just be there, whether in-person or online.

    You can do whatever else you want while being there, but it's encouraging to not be along in your endeavours.

    You may even find some motivation of your own.

  2. When others are talking about what they want to achieve, respond enthusiastically.

    It makes them more likely to follow-through and do so, and in the very least makes them feel good.

    This does risk making them feel worse if they never get around to it, but sometimes that's sufficient to shame them into action later, and other times it's sufficient to say "these things happen".

  3. Engage in discussion about what others want to achieve.

    It's extremely valuable for refining ideas, so they can implement what they want to do better, it keeps it fresh in their mind so motivation lasts longer, and it leaves a clearer idea of what to do so it may be completed before motivation runs out.

  4. Mention what other people are doing to people who might be interested.

    You could end up with anecdotes of other people thinking it's a cool idea, which when relayed to people doing the work provides their own motivation.

  5. Remind people of the successes they've had.

    It makes people feel good about what they've already done, and can put any issues they are currently struggling with into perspective.

    Lars pointed out that Yakking has published more than 180 articles at a rate of one per week! We've managed to get this far, we can continue for a good while yet.

Posted Wed Mar 1 12:00:07 2017 Tags:
Daniel Silverstone Please be careful when you test

We have spoken before about testing your software. In particular we have mentioned how if your code isn't tested you can't be confident that it works. Whe also spoke about how the technique of testing and the level at which you test your code will vary based on what you need to test.

What I'd like to talk about this time is about understanding the environment in which your tests exist. Since "nothing exists in a vacuum" it is critical to understand that even if you write beautifully targetted tests, they still exist and execute within the wider context of the computer they are running on.

As you are no doubt aware by now, I have a tendency to indulge in the hoary old developer habit of teaching by anecdote, and today is no exception to that. I was recently developing some additional tests for Gitano and exposed some very odd issues with one test I wrote. Since I was engaged in the ever-satisfying process of adding tests for a previously untested portion of code I, quite reasonably, expected that the issue I was encountering was a bug in the code I was writing tests for. I dutifully turned up the logging levels, sprinkled extra debug information around the associated bits of code, and puzzled over the error reports and debug logs for a good hour or so.

Predictably, given the topic of this article, I discovered that the error in question made absolutely no sense given the code I was testing, and so I had to cast my net wider. Eventually I found a bug in a library which Gitano depends on, which gave me a somewhat hirsuite yak to deal with. Once I had written the patch to the library, tested it, committed it, made an upstream release, packaged that, reported the bug in Debian, uploaded the new package to Debian, and got that new package installed onto my test machine - lo and behold, my test for Gitano ran perfectly.

This is, of course, a very particular kind of issue. You are not likely to encounter this type of scenario very often, unless you also have huge tottering stacks of projects which all interrelate. However you are likely to encounter issues where tests assume things about their environment without necessarily meaning to. Shell scripts which use bashisms, or test suites which assume they can bind test services to particular well known (and well-used) ports are all things I have encountered in the past.

Some test tools offer mechanisms for ensuring the test environment is "sane" for a value of sanity which applies only to the test suite in question. As such, your homework is to go back to one of your well-tested projects and consider if your tests assume anything about the environment which might need to be checked for (outside of things which you're already checking for in order to build the project in the first place). If you find any unverified assumptions then consider how you might ensure that, if the assumption fails, the user of your test suite is given a useful report on which to act.

Posted Wed Feb 22 12:00:06 2017

This year I was fortunate enough to attend FOSDEM.

Since a project I work on recently made its 1.0 release and is scheduled to be part of the next Debian stable release this means it will plausibly attract more attention.

Being inexperienced in community engagement, I decided to attend various talks on this topic.

Building an accessible community (video)

This is mostly not relevant to Gitano because it is about how to run a conference and handle accessibility.

It was entertaining though, and a useful bit of advice, that little things like using gender neutral pronouns in text can help to foster an inclusive atmosphere.

As a result I will try to proof read my writing in case I unintentionally included non-inclusive language.

Overcoming culture clash (video)

This was a talk about some theory about what kinds of cultural differences there are, and some specific cultural differences that often cause issue.

It started out with a metric by which cultural attitudes about 6 supposed characteristics of culture can be quantified so the difference between cultural attitudes can be measured.

The speaker admitted that they weren't an expert on the topic so I couldn't ask for clarification of how some characteristics differed since my impression was that at least three of them overlapped.

More practical advice included:

  1. Communities are built one person at a time. So we should try to foster good relations with existing and new users.
  2. Local support groups should be encouraged, but should be helped to interact with the wider community by inviting members to events and visiting them.
  3. Get to know the cultural differences of local groups.
  4. Avoid real-time (face to face or IRC) meetings. Text is better than video calls, and asynchronous is better than synchronous since it is a lot easier to translate.
  5. Plan events with awareness of religious and national holidays so you're not accidentally excluding someone. For example, FOSDEM is set during Chinese new year, so can be problematic.
  6. Don't be afraid to ask if people have issues, but be aware that not wanting to impose is also a cultural value, so they may attempt to appease unnecessarily.

Like the ants (Growing communities)

There was a talk from a communities manager from Google talking about how to foster a community without driving it.

The gist is that it should emulate a hive-mind like ants, where rather than dictating direction you would provide feedback mechanisms to encourage what is wanted.

This is not relevant to Gitano since we will be part of any community that happens.

Open source is just about the source, isn't it? (video)

This was a talk from a community manager about a bunch of non-code parts of a project to worry about.


First was trademark handling.

For Gitano we've started well by picking a name that is not already used for a git server and created a logo that resembles no other git server.

We will need to ask anyone who names their git server Gitano to rename though.

Finding users

We need to go out and find potential users, rather than waiting for them to come to us.

Talking about it on social media may help, and getting users to talk about it would help.

As would submitting talks to relevant conferences.

Larger projects can submit an article to a relevant journal since journalists are lazy and will print articles to fill space.

I intend to speak about Gitano at more conferences as a result.

Supporting users

Supporting users is essential, you don't know who might be a valuable contributor so be friendly to everyone.

You can't always expect users to come to you, you need to go where they are, which may mean subscribing to Stack Overflow to see if Gitano is mentioned.

Retaining contributors

Non-coder contributors can be hugely valuable, since they provide support for other users and may be able to provide support in languages you don't understand.

Retaining contributors depends heavily on how responsive you are. If you can provide automated feedback on code style etc. it helps.

If a useful contributor is approaching burn-out, if you can arrange for them to be employed to do so then that's handy, this is not helpful for Gitano since we're not a big foundation.

There isn't much we can do before other contributors turn up or leave for this.

Managing infrastructure

If infrastructure is required for development then upstream must provide it, since contributors are even less likely to be able to provide their own.

If infrastructure is expensive then a tip jar helps.

Try not to spread infrastructure across too many systems, or at least provide a landing page to locate everything.

Have one place for recording canonical decisions, don't split them across mailing lists or wikis.

For Gitano we expect the canonical place for policy to be the wiki, so we're going to have a policy page and link to all the infrastructure from the wiki.

All Gitano's infrastructure is either free or cheap and paid out of the lead developer's pocket, so we don't need to think about a tip jar yet.

Expect to leave

Plan for your own exit from the project. Nothing lasts forever.

It is helpful if you can centralise contact details so they can be changed more easily.

For Gitano we plan to have a contact details page to centralise as much as we can, and avoid giving out the details of any particular person as a contact.

Deciding on communications channels

Given the discussion of various contact channels and the importance of using appropriate ones I asked the speaker if she had any tips on how to evaluate which to use.

I was recommended to decide based on:

  1. Which is easy for your developers to use.
  2. Which is easy for your contributors to use.

    Ideally by asking existing users what they would prefer, but otherwise an educated guess based on what the target users might want.

In Gitano we have opinionated developers who like mailing lists, IRC and RSS feeds, so to widen the support net we're going to add a link to webchat.

Posted Wed Feb 15 12:00:08 2017 Tags:

Any software that is used by more than a trivial number of people will at some point be used by people living in different countries, with different cultures. Different cultures have different conventions which might cover date format, decimal mark, currency, names, units of measurement, spelling and of course, language (that's natural language, not programming language).

The practice of building support for a variety of cultures into your software is called internationalisation, often referred to as I18N. This is the practice of writing software that is independent of any particular culture. There is a closely relation topic localisation (L10N). This can be thought of as taking an internationalised piece of software and localising it for one specific region.

You should be careful to separate your internationalisation from your business logic.

It is unlikely that you can cover every possible variation in this yourself. Luckily, there are several tools designed to deal with the problem of internationalisation. These will be covered in a future article so stay tuned!

Posted Wed Feb 8 12:00:07 2017
Daniel Silverstone Semantic $THING

There has been, of late, a significant gain in mindshare being enjoyed by a number of movements which call themselves Semantic $THING. These usually encode some extra meaning in content which otherwise exists. Commonly these are intended to make it easier for computers to read extra meaning into something intended for humans; but sometimes they're intended to allow humans to more easily deal with something meant for computers.

Semantics are, in a very basic sense, how meaning is overlayed onto the syntax of content. I like to think of it as: syntax is the 'how', but semantics are the 'what'.

Here are three Semantic $THINGs which I think you ought to know about:

  • The Semantic Linefeeds concept is intended to make it easier for humans to grok the delta between two versions of a text file intended to be processed (for example markdown)

  • The Semantic Versioning concept is intended to make it possible for humans, and software, to understand the relationship between different releases of a piece of software.

  • The Semantic Commits concept is intended to make it easier to produce changelogs for projects and there's a number of tools built up around this.

If you know of any other useful Semantic $THINGs then why not comment on this article to let others know about them? For homework, I simply suggest you read the above linked articles, and then do a little of your own research around the topics and consider if you might need to take on any of the points in your own projects. I am considering semantic commits for my main projects at the time of writing this article.

Posted Wed Feb 1 12:00:06 2017

We previously spoke of the FHS directory standard.

This includes important rules for where a program should store its data. This is particularly important when writing your programs as you need to know where you should write your program state to.

While this answers where you should write your state to for programs that run system-wide, it does not help for per-user programs.

The traditional unix approach for this has been dotfiles so each program leaves hidden files all over the user's home directory.

This is sub-optimal since the directories are hidden by default, which solves not cluttering the file listing in your home directory at the cost of merely hiding the clutter rather than structuring it.

To plug this gap the XDG directory standard was designed.

Rather than it specifying exactly which directories should be used, it specifies an environment variable which descibes which directory to use and a default if the environment variable is not defined.

Variable Default FHS equivalent Purpose
XDG_CONFIG_HOME ~/.config /etc Program configuration files.
XDG_CACHE_HOME ~/.cache /var/cache Non-essential cached data.
XDG_DATA_HOME ~/.local/share /usr/share Data files.
XDG_RUNTIME_DIR /var/run Program state for current boot.

The standard also defines XDG_DATA_DIRS and XDG_CONFIG_DIRS, but they are about where to read system-wide config files from.

The rules are meant to be simple enough that you can implement it yourself, but there are some helper libraries available.

If you're writing python there is PyXDG. If you're writing C there's implementations in GLib, or a 3-clause BSD licensed implementation in chck.

If you're interested in more standards for where to put files, a relevant XDG user directory standard exists, which lists more variables in ${XDG_CONFIG_HOME}/user-dirs.dirs for directories like the default place to download files to.

Posted Wed Jan 25 12:00:07 2017 Tags:
Daniel Silverstone Give credit where credit is due

If you've been lucky enough to write some software worth publishing, and then super-lucky enough to have others like it enough to use it; and then ultra-mega lucky enough to have someone to like it enough to send you a patch for it; then you may be lucky enough to need to have a credits file.

When people contribute to your projects, it's only polite to make a little note so that they can see that their contribution was appreciated. Some of my projects even credit my employer when I've been fortunate enough to be allowed to write some F/LOSS at work.

The same applies if you "borrow" code from another project. Be sure to comply with their licensing terms, and even if they don't require it, it's only polite to thank them for their work. So for your homework, I'd like you to go back to any software you've written which has received patches from others and be sure to credit your contributors clearly and fully.

Posted Wed Jan 18 12:00:06 2017