You've started a new project. When should you start using it "in anger", "for real", "in production"? My advice is to do so as early as you can.

I did this for my newest project. I've been developing it slowly, and things had matured enough that it could now actually do things. I set up two instances doing things that I, and my company, now rely on. If the software breaks, I will need to take action (disable the broken instances, do things in another way until I fix the software).

The point of doing this early is that it gives me quick feedback on whether the software works at all, and makes it easy to add new features, and makes it easier for others to try out the software. (When I announce it publically, which I'm not yet doing.)

  • I see quickly if something doesn't work. (For now, it does.)
  • I see at once if there's a missing feature that I urgently need. (I have found two bugs yesterday.)
  • I see what is awkward and cumbersome for configuring, deploying the software. (So many interacting components.)
  • I see if it's nice to actually use. (I need to find someone to write a web interface, and need to improve the command line tool.)
  • I see if the performance is adequate and get an idea what the actual resource requirements are. (Not many resources needed for now. Even the cheapest VM I could choose is adequate.)
  • I get more confident in the software the more I actually use it. Writing a test suite is good, but real use is better. Real use always comes up with things you didn't think about writing tests for.

In order to set up not just one but two instances, I had to make the deployment automated. (I'm that lazy, and I don't apologise for that.)

Thanks to an automated setup, when I add features or fix bugs, they're easy to roll out. It's now almost as easy as just running the program from the source tree.

My development process is now, I write tests; I write code; tests pass; I tag a release; I let CI build it; and I run Ansible to upgrade everywhere. About 15 seconds of work once the tests pass, though it takes a couple of minutes of wall-clock time, since I run CI on my laptop.

Apart from the benefits that come from the features of the software itself, getting to this stage is emotionally very rewarding. In one day, my little pet project went from "I like this idea" to "it's a thing".

I recommend you start using your stuff in production earlier rather than later. Do it now, do it every day.

Posted Wed Nov 22 12:00:08 2017 Tags:

This post intentionally left blank.

We ran out of Yakking articles and energy to write new ones. There's plenty of things to write about, but the usual Yakking writers have been feeling a bit under the weather lately. If you'd like to help, leave a comment on Yakking, or join the IRC channel #yakking (on the irc.oftc.net network), and offer an article.

The previous break in the weekly schedule was December 25, 2013. That was due to a typo in the intended publication date: it got scheduled for 2103-12-25, not 2013-12-25.

We hope to return back to the normal schedule. Your patience is appreciated. Resistance less than 4.2 Ohm is futile.

Posted Wed Nov 15 12:33:18 2017

Before everyone had a multitude of computers of their own computers were rare, and if you wanted to use one you had to share it.

Given the demand for computers exceeded the supply, people had to share time using it.

Initially you could do this with a stop watch, but it's better for the computer itself to be able to measure this time since as computers became more complicated:

  1. Pre-emption, where one process can be interrupted to run another, means the time take up by a program isn't just the difference between when the program started and ended.

  2. Multi-threading, where a program can run multiple commands simultaneously, means you can use CPU time at a rate of more than one CPU second per second.

Computers became so pervasive that most computer users don't need to share, but virtual server providers also need to account for time used, and CPU time can also be used to measure how long it takes to perform an operation for profiling purposes so when a program is slow you know which part is the most worth your time to optimise.

Getting CPU time

The CPU time is read in the same way as other clocks, with different clock IDs for each process or thread.

  1. Current process with CLOCK_PROCESS_CPUTIME_ID.

    int ret = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
    
  2. Current thread with CLOCK_THREAD_CPUTIME_ID.

    int ret = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &time);
    
  3. Another process with clock_getcpuclockid(3).

    int pid_gettime(pid_t pid, struct timespec *tp) {
        int ret;
        clockid_t clockid;
        ret = clock_getcpuclockid(pid, &clockid);
        if (ret != 0) {
            return ret;
        }
        ret = clock_gettime(clockid, tp);
        return ret;
    }
    
  4. Another thread with pthread_getcpuclockid(3).

    int thread_gettime(pthread_t thread, struct timespec *tp) {
        int ret;
        clockid_t clockid;
        ret = pthread_getcpuclockid(thread, &clockid);
        if (ret != 0) {
            return ret;
        }
        ret = clock_gettime(clockid, tp);
        return ret;
    }
    

See gettime.c for an example program for reading the times, and Makefile for build instructions.

Profiling

We can instrument code (see profile-unthreaded.c) to see how much time a section takes to run.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>

static void print_time(FILE *f, struct timespec time) {
    fprintf(f, "%lld.%09lld\n", (long long)time.tv_sec, (long long)time.tv_nsec);
}

int main(int argc, char **argv) {
    enum {
        ITERATIONS = 1000000,
    };
    int ret, exit = 0;
    struct timespec start, end;

    ret = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
    if (ret != 0) {
        perror("clock_gettime");
        exit = 1;
        goto exit;
    }

    for (int i = 0; i < ITERATIONS; i++) {
        fprintf(stdout, "% 7d\n", i);
    }

    ret = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
    if (ret != 0) {
        perror("clock_gettime");
        exit = 1;
        goto exit;
    }

    end.tv_sec -= start.tv_sec;
    end.tv_nsec -= start.tv_nsec;
    if (end.tv_nsec < 0) {
        end.tv_sec--;
        end.tv_sec += 1000000000l;
    }

    print_time(stderr, end);

exit:
    return exit;
}
$ make profile-unthreaded
$ ./profile-unthreaded >/tmp/f
0.073965395

We can make use of threads to try to speed this up (see profile-threaded.c).

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>
#include <unistd.h>

#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*x))

static void print_time(FILE *f, struct timespec time) {
    fprintf(f, "%lld.%09lld\n", (long long)time.tv_sec, (long long)time.tv_nsec);
}

struct thread_args {
    int fd;
    int start;
    unsigned len;
};

void *thread_run(void *_thread_args) {
    struct thread_args *thread_args = _thread_args;
    char buf[9];
    for (int i = thread_args->start;
         i < thread_args->start + thread_args->len; i++) {
        ssize_t len = snprintf(buf, ARRAY_SIZE(buf), "% 7d\n", i);
        pwrite(thread_args->fd, buf, len, i * len);
    }
    return NULL;
}

int main(int argc, char **argv) {
    enum {
        ITERATIONS = 1000000,
        THREADS = 4,
    };
    int i, ret, exit = 0;
    struct timespec start, end;
    pthread_t threads[THREADS];
    struct thread_args thread_args[THREADS];

    ret = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
    if (ret != 0) {
        perror("clock_gettime");
        exit = 1;
        goto exit;
    }

    for (i = 0; i < ARRAY_SIZE(threads); i++) {
        thread_args[i].fd = 1;
        thread_args[i].start = ITERATIONS / THREADS * i;
        thread_args[i].len = ITERATIONS / THREADS;
        ret = pthread_create(&threads[i], NULL, thread_run,
                             &thread_args[i]);
        if (ret != 0) {
            perror("pthread_create");
            exit = 1;
        }
    }
    if (exit != 0) {
        for (; i >= 0; i--) {
            (void) pthread_cancel(threads[i]);
        }
        goto exit;
    }
    
    for (i = 0; i < ARRAY_SIZE(threads); i++) {
        ret = pthread_join(threads[i], NULL);
        if (ret != 0) {
            perror("pthread_join");
            exit = 1;
        }
    }
    if (exit != 0) {
        goto exit;
    }

    ret = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
    if (ret != 0) {
        perror("clock_gettime");
        exit = 1;
        goto exit;
    }

    end.tv_sec -= start.tv_sec;
    end.tv_nsec -= start.tv_nsec;
    if (end.tv_nsec < 0) {
        end.tv_sec--;
        end.tv_sec += 1000000000l;
    }

    print_time(stderr, end);

exit:
    return 0;
}
$ make profile-threaded
$ ./profile-threaded >/tmp/f
3.185380729

By instrumenting we can tell that this actually made this section a lot slower.

Don't do this

Manually instrumenting things is a lot of work which means you are only going to do it for bits you already suspect are slow.

GCC's -pg adds instrumentation to dump times in a format readable by gprof.

valgrind when invoked like valgrind \-\-tool=callgrind prog and kcachegrind to view it. This runs your program on an emulated CPU, so it can use its own model of how long an operation takes for accounting time so it is unaffected by the overhead of profiling.

perf makes use of CPU features to measure with minimum overhead.

make CFLAGS=-ggdb command
perf record \-\-call-graph=dwarf ./command
perf report
Posted Wed Nov 8 14:30:19 2017 Tags:
Lars Wirzenius Communicating

Most of software development is, in fact, communication. Even more so for free software projects that involve more than one person. Communication is an overhead, something you need to do on top of coding, so many hackers don't pay much attention to it. This is a mistake. A project whose members communicate effective gets more done in less time than one whose members don't. Here are some hints that may be useful:

  • use the appropriate medium: if you need an answer soon, ask on a chat system such as IRC; if you can wait a while, use email or a blog post or a web forum

  • be short and clear: a long, rambling question takes more time and effort to read, never mind respond to, and is often less clear and thus results in a less helpful answer

  • take responsibility of getting the problem solved: develop a way to reproduce the problem, and if it's code, with the shortest, simplest piece of self-standing code you can (you'll often find you find the answer yourself)

  • make it easy to help you: explain what you really want to achieve (not something else, even if you think it's easier), and what you've done, and what the exact result is (use copy-paste or take a screenshot)

  • don't be insulting, arrogant, dismissive, or aggressive: you need help, don't make those you want it from not like you, or they might not even try

  • say thank you: you'll be remembered well, and those who helped you (or tried to) will have more fun and are more motivated to work for free for others in the future

Posted Wed Nov 1 12:00:10 2017 Tags:

Free software development always has an ethical dimension. We develop free software instead of proprietary software to allow people, the users of our software, to not be as controlled by vendors of proprietary software as they might otherwise be.

Software freedom is, however, only one ethical dimension. Free software can also be unethical, for example by doing things that hurt people. As an extreme example, a free software implementation of ransomware would be quite problematic, under any licence. It doesn't matter if the program is, say, under the GPL licence, if it attacks people's computers, and encrypts their data and refuses to decrypt it until the author of the program has paid a ransom. Not even if the program installs its own source code on the computer when it encrypts all other data would it be considered ethical.

When you write software you should consider the ethics, the morality, and the impact on everyone. For example:

  • Does it promote racism, sexism, or violence, directly or indirectly? For example, if it's an "AI" that tries to guess if someone will commit a crime in the future, is it effectively only based their race?

  • Does it use bandwidth unneccessarily? Bandwidth is an expensive luxury in some parts of the worlds, so wasting it discrimnates against people in those parts of the world.

  • Does it "call home", such as report usage to the developers? This violates user privacy. Gathering usage statistics can be very useful to the developers in more ways than one, but to do so without requesting permission remains a violation of privacy.

Have you encountered ethically problematic software? Or perhaps have you found some exemplar of ethical software development? Why not give an example in the comments below...

Posted Wed Oct 25 12:00:12 2017
Daniel Silverstone Keeping your passwords safe

There're a number of ways of keeping your passwords as safe as can be. One very old-school way is to write each password down in a book, and keep that book physically secure. On the assumption that you can't remember the passwords without the book, this is a reasonable way to improve your security. Sadly it doesn't scale well and can make it quite hard to keep things up-to-date.

More usefully, in today's multi-computer and multi-device world, there are programs called 'password managers' and as with anything even vaguely interesting there are a number of them to choose from. Some names you may already be familiar with include Keepassx, 1Password, and LastPass.

Password managers offer you a way to effectively protect all your passwords with a single token, often allowing you to sync and manage your passwords without needing lots of knowledge of how things work. They're usually also integrated with your web browser, your phone, etc, to allow you a seamless experience.

If you're a little more paranoid than the normal geek though, and you're prepared to sacrifice a bit of simplicity for a bit more ease-of-mind, then you could try Password Store (pass) which is written in Bash and uses GnuPG. I personally use pass and have my GnuPG key stored in a Yubikey which I keep around my neck. (There's also the Gnuk which can, I believe, do a similar job) With the need of the physical token and also the PIN to unlock it, this is a multifactor authentication system which I then can use to secure my passwords etc. I then have it backed onto my own Git server where I can keep an eye on the content safely.

I strongly suggest that if you're not using a password safe of some kind, that you get one set up and start using it. In fact, if you've not got one, go and do it now and I'll see you next time...

(Oh yeah, and if you look at multifactor authentication, be aware that your intrinsic factor today is simply your adversary's posession factor tomorrow)

Posted Wed Oct 18 12:00:15 2017

It's perfectly OK to have a personal project that only you yourself work on. In fact, most free software projects are like that. A project with only one contributor, or only a couple, can be quite limited however. A larger group tends to get more done and, more importantly, they do different things. A more diverse group brings in more points of view which tends to make the project better suited to a larger group of users.

Attracting contributors to a project you've started can be tricky. Your humble author asked on Twitter and Mastodon for advice on this very topic, and wrote up a summary on his own blog.

The condensation of the summary is:

Get people over the hump of making their first contribution, by making it easy and removing all unnecessary obstacles. Make contributing into a rewarding experience.

Obstacles can be things like making it difficult to install, run, or use the software, making it difficult to find, retrieve, or understand the source code, not having public discussion forums (mailing lists, IRC channels, etc), or not having a public ticketing system.

Posted Wed Oct 11 12:00:07 2017
Daniel Silverstone H0w s3cUre aR3 ur p455w0rdz‽

There are many schools of thought around how to create 'secure' passwords. While they differ in the various ways to assess if a password is secure or not, they are all united in their goal of making it harder for both pesky humans and super-powerful computers to guess your passwords. In addition, the ways of storing passwords vary depending on desired security levels.

Before we discuss ways to make secure passwords, let's take a moment to consider something called entropy. To properly understand entropy can take years, so here's a brief précis… In essence, and for our purposes, entropy is a measure of how "random" your password is. Entropy is a measure of information and, for passwords, we want as much entropy as possible since that makes it harder for an adversary to guess. Sadly there's no trivial way to estimate how much entropy is present in a password because a computer cannot know all possible context around the person setting or using the password. This is the crux of the arguments around password policies, qualities, etc.

Bruce Schneier, who is a well respected security expert, wrote a nice article on passwords.

The hard-for-humans password

"A good password consists of between ten and forty characters, with a mix of upper- and lower-case letters, numbers, and symbols."

The "alphabet" of characters which may be part of a password can be as large, or as small, as you like. One school of thought says that (a) the alphabet should be as large as possible and (b) that passwords should be mandated to have at least one of each class of characters in the alphabet.

These passwords are often very hard for humans to guess if constructed entirely randomly. Sadly humans are very bad at remembering well constructed passwords of this kind and as such they tend not to be well constructed. For example, on the face of it, 94Pr!LOR;Fq. might be an excellent looking password. Sadly if you knew that my birthday is the 9th April, you might guess the first half, and the second half is an inversion of shift state combined with a down/right migration on a UK qwerty keyboard. The first half is context which a human might guess and the second is the kind of translation which a computer will likely try quickly and easily.

However, for a moment let's consider the possibility that it were a good password, let's estimate the entropy in it. We'll be naïve and generous in our estimation... The 'alphabet' has somewhere around 100 elements, let's assume it has 128 elements and as such each character is, generously, seven bits of entropy. Our 10 character password is thus 70 bits of entropy, but we might halve that because of the repetition, giving 35 bits of useful entropy. By comparison the smallest of possible keys which computers might use these days are 256 bits so we puny humans are nowhere near and we're finding it hard to be there.

Correct Horse Battery Staple

Another stable of thought (yes, pun intended) is that a longer but more easily memorised password would be more secure. There are around 100,000 words in the standard word list on my laptop (/usr/share/dict/words) so picking one of those is, in theory, around 16 bits of entropy but let's be conservative and call it 11 bits. Four words, chosen at random, therefore have 44 bits of entropy. If you add in some capitalisation tweaking to make that estimate a little more reasonable; and then add two or three more words and bring the entropy estimate way above anything you might manage to memorise from a random password above.

Keeping passwords securely

Sadly, two parties need to keep passwords if they are to be a mechanism of authentication. The person who is being authenticated (you) and the entity who is doing the authentication (some website for example). In order to reduce the impact of a data breach, passwords will be stored hashed by sites which care. Algorithms to do this are designed to make it mathematically improbable that you can find a password purely by knowing the hash of it. In addition they are often designed to be computationally expensive to calculate in order to reduce the ease by which computers might test guesses. There are a number of algorithms which are considered good for this, such as scrypt or bcrypt which require a reasonable chunk of non-parallelisable CPU time and a non-trivial amount of memory to compute.

Sadly you can't use the same algorithms to store your passwords safely because you won't be able to recover them. We'll consider ways you can do that in a future article.

Posted Wed Oct 4 12:00:12 2017

It's common knowledge that the number of days in a year is not constant.

Since the Earth doesn't orbit the Sun in exactly 365 days and it is impractical to not have a whole number of days in the year we allow our calendar seasons to become marginally desynchronised with astronomical seasons and every four years (more or less) have a year with an extra day in it.

This is called a leap year, and contains a leap day.

There's an analogous concept of a leap second to correct the discrepancy between the Earth's rotation about its axis and the number of seconds as measured by atomic clocks.

Leap seconds

On days with a positive leap second there is an extra second in the day. To minimise disruption this second officially happens at midnight.

As a result 23:59:60 is a valid time of day.

The average person is not likely to notice that, though computers will, and since computers run software written by humans they may not handle this very well.

The default handling of this in Linux when using CLOCK_REALTIME is that 23:59:59 lasts for two seconds.

This could cause issues for software that controls things like robot arms, since an instruction to rotate for 1 second would become rotate for 2 seconds.

An alternative, handled by using Google's time servers as your NTP server, is to smear the leap second across the whole day.

Since your computer typically has to deal with your clock drifting small corrections are regularly made anyway, so this is a neat solution.

This doesn't help if your computer normally has a very reliable clock, as any software written specifically to run on it will depend on this property which is not correct for a whole day every so often.

This could be a problem for specialist control software, but it's a more serious problem for authoritative NTP servers who can't use an upstream NTP server to do the time smearing for them.

To handle this there's CLOCK_TAI which will return 23:59:60.

The leap second adjustments are handled by adjtimex(2), applying a time smear if using Google's time servers, or inserting a leap second into the day.


With the discussion of CLOCK_TAI we have discussed all the clocks that measure the passage of time.

Another type of clock for measuring a different notion of time is the CPU clock, for accounting how much CPU a process or thread uses, which will be discussed at another time.

Posted Wed Sep 27 12:00:13 2017 Tags:
Daniel Silverstone Psst, can you keep a secret?

Human society is based, in part, on keeping things secret. Our society (as it is) would fail horribly if everything was publically known. We rely on keeping some information secret to protect our private content. For example, we often protect access services with secrets we call passwords (though if they are simple words then it's unlikely they're very secure). We also use things called cryptographic keys which are large complicated-to-work-with numbers which computers can use to secure information.

If you've been following since we started Yakking then you've probably got some passwords and some keys of your own. Your keys might be things like SSH identities or GnuPG keys. Your passwords will protect things like your computer login, your social media accounts, etc.

As computer users, we have so many of these secrets to look after that you're unlikely to be relying on your own fallible memory. As such you're likely already getting your computer to remember them for you. If you're doing this semi-well then you're protecting all the remembered credentials with some password(s) and/or key(s).

There are many ways of looking after credentials, generating passwords, measuring the quality of passwords, handling keys, policies for retaining and changing credentials, etc. Over the next few articles we'll discuss a number of these points and hopefully you'll all feel a little more secure as a result.

Posted Wed Sep 20 12:00:11 2017 Tags: