Daniel Silverstone Keep notes

A while ago, when I offered you all the wisdom of the Open Source Truisms, I spoke about how you shouldn't do things without knowing the reason why. It has been a while since then, but I wanted to say something else along these lines, and it is this:

Keep notes

Day after day, almost all of us have conversations along the lines of "Oh yes, I'll get to that when I have a moment". Some of us manage to avoid the common follow-up conversation of "Oh no! I forgot, I'll sort it out soon I promise". I'd hazard a guess that none of us remember everything we agree to do all the time, and even when we do remember we often end up putting it off. I am as guilty of this as the next person.

One way I have learned to get better though, is to ensure that I keep a daily journal at work (and do something similar for home). I ensure that any time I agree to do something, I make a note of it with a particular greppable pattern associated with it (specifically ACTION:). When something has a particular deadline, I include the deadline with the note, or put something into my calendar to remind me. Once I've done the thing, I mark it with another distinct pattern (namely **DONE**) and every morning I read the previous day's journal and carry any incomplete things forward in a special section called "Carried actions". This is a poor-man's approach to a self-organisation technique known as 'Getting Things Done' - Lars wrote a book along the same lines, called Getting Things Done for Hackers.

While I've got the hang of this approach for work, for my personal hacking I tend to use Trello and while it's not free software (so I know I'll get panned) I do find it works very well to help me manage lists of tasks.

In the end, it doesn't matter how you keep your notes and record your actions to take, what matters is that you do keep notes, record actions, and get things done. Your homework is simple this week. For those of you who already keep notes and record actions - simply go to your notes and ensure that you've cleared as many actions as you can as quickly as you can, and that your notes are complete and likely to be useful to future-you. For those of you who are not yet making notes and getting things done, your homework is to go to the wider web and look for options, try a few out, and be sure to mark your homework as **DONE** when you've found a way to do it which works for you.

Posted Wed Jul 6 11:00:06 2016

You should be familiar with the mv(1) command by now, which moves a file from one place to another.

If you're writing shell scripts, or if you're using a library which lets you move files then you don't need to worry about how it works, but if you don't have a library available you might be surprised by the amount of effort required.

It's just a rename(2), right?

If you can guarantee that the destination is on the same filesystem and that you don't care if it replaces some other file, then yes.

Not clobbering

mv(1) has the -n or --no-clobber option to prevent accidentally overwriting a file.

The naive way to do this would be to check whether the file exists before calling rename(2), but this is a TOCTTOU bug which can cause this to overwrite if another thread puts a file there.

To do this safely use renameat2(2) with the RENAME_NOREPLACE flag, which will make it fail if the destination already exists.

The destination is on another filesystem

On a modern Linux distribution your files are usually spread across multiple file systems, so your persistent files are on a filesystem mounted from local storage, but your operating system puts temporary files on a different file system so they get removed when your computer shuts down.

Unfortunately, the rename(2) system call does not work if the destination is on a different file system.

Checking ahead of time whether a path is on a different file system is traditionally handled by calling stat(2), and checking whether the st_dev field differs, but this is another TOCTTOU bug waiting to happen and rename(2) sets errno(3) to EXDEV which lets you know it failed for being on another filesystem in the same system call you would have made anyway.

If you care about still being able to move the file when its destination is on a different file system then you need a fallback when this happens.

So we fall back to copying the file and removing the old one?

In principle, yes, though actually implementing this is surprisingly difficult.

Handling the fallback logic itself is not straight-forward, we'll get that out of the way first.

We can fallback to rename(2) if renameat2(2) is not implemented but only if we don't need to handle not clobbering the target.

When that happens we need to fall back to the copy, which can use O_EXCL with O_CREAT to only write to the file if it didn't already exist.

If unlinking the source file fails because the file doesn't exist, then that means that we were able to copy its contents while something else removed it.

Given the file was written to its destination and it no longer exists where it used to it can be argued that the operation as a whole was successful.

/* my-mv.c */
#include <stdbool.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/syscall.h>

#if !HAVE_DECL_RENAMEAT2
static inline int renameat2(int oldfd, const char *oldname, int newfd, const char *newname, unsigned flags) {
    return syscall(__NR_renameat2, oldfd, oldname, newfd, newname, flags);
}
#endif

#ifndef RENAME_NOREPLACE
#define RENAME_NOREPLACE (1<<0)
#endif

int copy_file(const char *source, const char *target, bool no_clobber);

int move_file(const char *source, const char *target, bool no_clobber) {
    int ret;
    ret = renameat2(AT_FDCWD, source, AT_FDCWD, target, no_clobber ? RENAME_NOREPLACE : 0);
    if (ret == 0)
        return ret;
    if (errno == EXDEV)
        goto xdev;
    if (errno != ENOSYS) {
        perror("renaming file");
        return ret;
    }
    /* Have to skip to copy if unimplemented since rename can't detect EEXIST */
    if (no_clobber)
        goto xdev;
rename:
    ret = rename(source, target);
    if (ret == 0)
        return ret;
    if (errno == EXDEV)
        goto xdev;
    perror("renaming file");
    return ret;
xdev:
    ret = copy_file(source, target, no_clobber);
    if (ret < 0)
        return ret;
    if (unlink(source) < 0 && errno != ENOENT) {
        perror("unlinking source file");
        return -1;
    }
    return ret;
}

So we open both files, and loop reading data then writing it?

This will produce a file that when read, will produce the same stream of bytes as the original.

You could use stdio(3) to copy the contents, but that will have to be left as an exercise for the reader, since I don't like its record-based interface, I prefer to deal with file descriptors over FILE* handles, and the buffering makes error handling moreā€¦ interesting.

So, broadly, the idea is to read into a buffer, then write from the buffer to the target file.

However, EINTR is a problem, many system calls can be interrupted before they do anything, and read(2) and write(2) may return less than you asked for.

Glibc has a handy TEMP_FAILURE_RETRY macro for handling EINTR, but to handle the short reads and writes, you need to always work in a loop.

int naive_contents_copy(int srcfd, int tgtfd) {
    /* 1MB buffer, too small makes it slow,
       shrink this if you feel memory pressure on an embedded device */
    char buf[1 * 1024 * 1024];
    ssize_t total_copied = 0;
    ssize_t ret;
    for (;;) {
        ssize_t n_read;
        ret = TEMP_FAILURE_RETRY(read(srcfd, buf, sizeof(buf)));
        if (ret < 0) {
            perror("Reading from source");
            return ret;
        }
        n_read = ret;

        /* Reached the end of the file */
        if (n_read == 0)
            return n_read;

        while (n_read > 0) {
            ret = TEMP_FAILURE_RETRY(write(tgtfd, buf, n_read));
            if (ret < 0) {
                perror("Writing to target");
                return ret;
            }

            n_read -= ret;
            total_copied += ret;
        }
    }
    return 0;
} 

int copy_file(const char *source, const char *target, bool no_clobber) {
    int srcfd = -1;
    int tgtfd = -1;
    srcfd = open(source, O_RDONLY);
    if (srcfd == -1) {
        perror("Opening source file");
        return srcfd;
    }
    tgtfd = open(target, O_WRONLY|O_CREAT|(no_clobber ? O_EXCL : 0), 0600);
    if (tgtfd == -1) {
        perror("Opening target file");
        return tgtfd;
    }
    return naive_contents_copy(srcfd, tgtfd);
}

Making use of our new function

So now we have a nice move_file function that will fall back to copying it if renaming does not work.

But code is of no use in isolation, we need a program for it to live in, and the simplest way to use it is a command-line program.

#include <getopt.h>
#include <string.h>

int main(int argc, char *argv[]) {
    char *source;
    char *target;
    bool no_clobber = false;

    enum opt {
        OPT_NO_CLOBBER = 'n',
        OPT_CLOBBER = 'N',
    };
    static const struct option opts[] = {
        { .name = "no-clobber", .has_arg = no_argument, .val = OPT_NO_CLOBBER, },
        { .name = "clobber",    .has_arg = no_argument, .val = OPT_CLOBBER, },
        {},
    };

    for (;;) {
        int ret = getopt_long(argc, argv, "nN", opts, NULL);
        if (ret == -1)
            break;
        switch (ret) {
        case '?':
            return 1;
        case OPT_NO_CLOBBER:
        case OPT_CLOBBER:
            no_clobber = (ret == OPT_NO_CLOBBER);
            break;
        }
    }
    if (optind == argc || argc > optind + 2) {
        fprintf(stderr, "1 or 2 positional arguments required\n");
        return 2;
    }
    source = argv[optind];
    if (argc == optind + 2)
        target = argv[optind + 1];
    else
        /* Move into the current directory with the same name */
        target = basename(source);

    if (move_file(source, target, no_clobber) >= 0)
        return 0;
    return 1;   
}
$ if echo 'int main(){(void)renameat2;}' | gcc -include stdio.h -xc - -o/dev/null 2>/dev/null; then
>     HAVE_DECL_RENAMEAT2=1
> else
>     HAVE_DECL_RENAMEAT2=0
> fi
$ make CFLAGS="-D_GNU_SOURCE -DHAVE_DECL_RENAMEAT2=$HAVE_DECL_RENAMEAT2" my-mv
$ ./my-mv
1 or 2 positional arguments required
$ touch test-file
$ ./my-mv test-file clobber-file
$ ls test-file clobber-file
ls: cannot access test-file: No such file or directory
clobber-file
$ ./my-mv -n test-file clobber-file 
rename2: No such file or directory
$ touch test-file
$ ./my-mv --no-clobber test-file clobber-file
rename2: File exists
$ ./my-mv test-file clobber-file
$ ls test-file clobber-file
ls: cannot access test-file: No such file or directory

So we've got a complete fallback for rename(2) now?

Not quite.

For most purposes this is likely to be sufficient, but there's a lot more to a file than the data you can read out of it, far more than I can cover in this article, so there will be follow-up articles to cover copying other aspects of files including:

  1. Sparseness
  2. Speed
  3. Metadata
  4. Atomicity
  5. Other types of file
Posted Wed Jul 13 11:00:06 2016 Tags:
Daniel Silverstone Hacking alone, hacking together

Of late, I have been involved in a number of F/LOSS hack days. Most of them have been based around my own F/LOSS project (Gitano) and over the years several have been related to NetSurf. One thing which characterises all these hack days is that they have been small (the largest was around six people).

Another thing which characterises them is that they were done on a shoestring budget, with donated venues, donated tea and biscuits, and most attendees paying their own way there and back; and yet, without exception, they helped us get stuff done; often stuff which had been floundering for a while.

Many hackers find it much easier to hack on their own. From time to time I find that I do my best work when I can focus and not be distracted by anyone else; but most of the time I find that I do my best F/LOSS work when there's others on the project nearby to chat to about the problems we face. Often-times the hack day is perhaps better named a 'design day' where we get a lot of useful discussion and design done (e.g. the day when Richard and I did the Gitano I18n design) and that's fine too. There's no hard and fast rule about what you get up to on a hack day.

If you're involved in a smaller F/LOSS project then often there's no resources for big flashy conferences, or perhaps not enough clout to swing a dev-room somewhere like FOSDEM; but that doesn't mean there are not options available to you. Naturally the smaller hackdays work best when the potential attendees are physically colocated, but that need not be a showstopper if it cannot be met. Simply arranging a time when everyone involved in a project will agree to be on IRC, TeamSpeak, or any other communication tool which might be appropriate can result in an effective way to improve the state of a project.

If you're involved in a somewhat larger project, then a resource such as the Hackday manifesto may be of use to you; and if you happen to be part of a huge project then perhaps there'll be dedicated conferences for you to attend, such as Debconf.

The moment there's more than you on a project, even if all you have for contributors are people who will try new versions and let you know how they work for them, you can make use of some level of hackday from time to time.

Your homework is to go over your inventory of F/LOSS projects which you count yourself as at least somewhat involved with, look at where they might have had hackdays in the past, and where they might be planning them in the future, and then either get yourself along to a hackday, attend one on IRC, or if you're feeling super-enthusiastic, then propose to organise one yourself.

Posted Wed Jul 20 11:00:08 2016

Free software licences can be roughly grouped into permissive and copyleft ones. Examples are the BSD and MIT licences for permissive and the GNU GPL for copyleft. A common argument is about which one is more free.

Permissive licences typically allow using a different licence for any derived works. In other words, if you release some software under a permissive licence, I can make changes to it and release the result under almost any other licence, including a proprietary, non-free one.

A copyleft licence requires a derived work to be released under the same licence as the original. I can't make my own version and make it not be free (unless I get all copyright holders to agree to a licence change).

This is the crux of the dispute of relative freeness. A permissive licence lets you do things that a copyleft one forbids, so clearly the permissive licence is more free. A copyleft licence means software using it won't ever become non-free against the wills of the copyright holders, so clearly a copyleft licence is more free than a permissive one.

Both sides are both right and wrong, of course, which is why this argument will continue forever. Because there's no clear objective winner, the arguments easily get very, very hot.

A related argument which type of license is better for doing business with. Many in favour of permissive licences claim that it's better for business, because you can do things in more traditional ways: sell non-free licences to customers, for example.

However, making a living or running a business using a copyleft licence is certainly possible, and not even unusual. It may require coming up with a different business model. In fact, because almost all software development is really a service business anyway, it doesn't necessarily matter a whole lot what the licence is, as long as your customers are happy to pay you for support and feature development.

A word of warning. Any discussions about these topics should be treated carefully. It's like using nitroglycerin: it's an important compound with several practical, industrial uses, but do not, repeat, DO NOT make juggling balls filled with it. If a discussion about the relative freedom of licence types becomes heated, step away. It's not worth participating anymore. The best case scenario is that several people get their feelings hurt and stop talking to each other, possibly forever.

If you want to make a choice between permissive and copyleft, you need to do it based on something else than their relative freeness. You might prefer permissive because it makes it easier to combine with other licences. You might choose copyleft because you don't want your software to ever become non-free. Or you might choose based on the most common licence type in the community you participate in. Or based on the length of the licence, or which one has a prettier SHA1 of the contents of the licence. And if anyone questions your choice, avoid getting into a fight.

Disclaimer: This was written by someone who has chosen copyleft and has earned much of their adult salary writing copyleft software.

Posted Wed Jul 27 11:00:07 2016 Tags: