Daniel Silverstone Be gracious in how you accept

We have, previously spoken about upstreaming changes and the etiquette associated with doing so. Perhaps now is a good time to mention that the effort we recommend you to put forth when contributing to a project is also needed when you are the upstream.

If you are fortunate enough to be upstream in a popular F/LOSS project then you may find yourself garnering contributions from the community you have fostered. This is a wonderful situation to be in, but it can also be a great burden on you. We have mentioned that when you offer code to an upstream you are effectively offering a burden not a gift. This means that as an upstream you need to recognise that the offer of a change is not something you must feel beholden to accept. By the same token, if the person making the offer has managed to demonstrate their dedication to the community then you can usually trust that they'll help look after their contribution on an ongoing basis.

Looking after a project as an upstream is often a thankless task and can become quite a grind to maintain a pleasant outlook in the face of the less experienced who may flock to you for assistance with your work. Learning to be gracious in accepting any and all contributions which are not negative in nature can help you to foster a community of people who will then police themselves and assist you with the greater effort of looking after everything.

Of course, you could just be a curmudgeonly upstream who acts as though the thought of anyone using their software is abhorrent. That won't stop some intrepid users so be aware they might send you patches or suggestions anyway. Do try to not scare them too much. If you do then they might fork your code and change it in unexpected ways; but with your name all over the codebase you might be inundated with questions anyway.

Posted Wed Sep 7 11:00:07 2016

Many free software developers would like to earn a living doing what they love. This is often thought to be tricky, since you can't just sell licences. Here's a list of business models for software freedom, with short commentaries.

  • Support contracts mean that users of the software, typically business users, pay for support: for help solving any problems they have with the software, such as fixing bugs, or helping with deployment or problems during use. This is a very common, and ideologically perfect business model.

  • New feature development is often combined with support contracts. Some support issues are due to missing features, and some users are happy to pay for their development. Also ideologically pure.

  • Consulting is a generalisation of support and feature development, and can also include giving advice on which software is the best solution for the customer, and help integrating the software with the customer's existing systems. Still ideologically pure.

    Example for this and the models above: a myriad of companies providing bespoke Linux kernel development.

  • Training is also a popular business model. New software is frequently a bit scary to users, and it takes them a while to properly up to speed. Training can make the transition easier, so it's worth to some users. Ideologically pure.

  • Hosting is an option for some software, particularly services accessed over the network, where the customer doesn't want to do the hosting themselves. This might be hosting for a specific customer, or for the general public. Ideologically pure as the driven snow. Example: Branchable.

  • Sponsorship is a possible way to be paid, but is probably harder to achive than an actual business. Sponsorship would probably come from organisations who get a publicity boost from supporting the project they sponsor. Example: OpenSSL via the Linux Foundation infrastructure project.

  • Donations and crowd funding are fashionable these days, but somewhat difficult to be successful in. They probably mostly only work for projects with a large number of users. For projects that aren't ready to use yet, they'll probably only succeed if they provide something that a lot of people want. Example: git-annex.

  • Open core is a name for the model where a project is free software, but there's a proprietary version with extra features (often called community vs enterprise versions). This is somewhat iffy from a software freedom point of view: it is effectively a proprietary software model with an limited open source version for marketing purposes.

  • Double licensing is like open core, except there's no real difference between the two versions. Paying customers just get the software under a non-free license, which sometimes is preferable to corporations than a free software license. Still iffy.

  • Merchandise can sometimes be a way of making a bit of extra on the side, but rarely enough to live on. Examples are t-shirts, stickers, and books. This is ideologically pure again.

As with everything else, making money out of free software is not automatic. If you want to run a business, software freedom is not a hindrance, and you'll still have to work hard to be successful.

Posted Wed Sep 14 11:00:07 2016 Tags:

There is a concept in computing which is known by various names, but perhaps the most common is the Robustness Principle. It can be summarised as:

Be liberal in what you accept, and strict in what you produce.

This is originally related to TCP implementations but has since been applied to almost all higher level protocols over time. The general concept is that when implementing a protocol you should be as careful as you can in what you generate, so that even the simplest of implementations can easily interpret your output; and that when interpreting what another implementation has sent to you, do your best to do something useful with what you receive, even if it is not the most correct of messages.

This is a good principle to work to, but like all good things, it can be taken to extremes; and in extremis it is a gun loaded and pointed at your head. In general, being strict in what you produce can't really result in damage, but taking liberal acceptance to extremes is what causes confusing and unusual attacks being possible. An excellent example of just how bad things can get when you're a little to liberal can be found at Bouke van der Bijl's Blog related to attacking Redis, Memcached, and Elasticsearch bound to localhost.

A useful corollary to the robustness principle is do not trust user input.

For your homework this week, go and have a look at something you have which processes input, generates output, and/or speaks any protocols. (This ought to be almost any software you have written). Look over that software's implementation of input data sanitisation or protocol implementations and decide if there's anything you could do to improve matters. Then improve them and feel better about yourself for a little bit.

Posted Wed Sep 21 11:00:08 2016

Previously we spoke about the common, POSIX file metadata.

This is not the only metadata that a program that handles copying files has to worry about on Linux.

Files also have some additional flags for changing their behaviour, or possibly read-only flags for providing extra information.

Accessing flags.

Flags were originally a feature of the ext2 filesystem, which means they don't have a dedicated system call, since filesystem specific features are often implemented as ioctls.

It also explains why you might see it called EXT2_IOC_GETFLAGS or EXT2_IOC_SETFLAGS.

When using ioctls, it's good to be paranoid, since the same ioctl number can be used for different devices, and you wouldn't want to accidentally do something unintended.

It's possible to check whether it's an appropriate file by using stat and checking the file mode.

We previously used this pattern for the file clone ioctl on btrfs, but included a check that it was a btrfs filesystem.

Since file flags are applicable to multiple filesystems checking the filesystem type should not be necessary.

#include <sys/stat.h>
#include <errno.h>
#include <linux/fs.h>
#include <sys/ioctl.h>

int get_flags(int fd, int *flags_out) {
    struct stat st;
    int ret = 0;
    ret = fstat(fd, &st);
    if (ret < 0)
        return ret;
    if (!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)
        && !S_ISLNK(st.st_mode)) {
        errno = ENOTTY;
        return -1;
    }
    return ioctl(fd, FS_IOC_GETFLAGS, flags_out);
}

int set_flags(int fd, const int *flags) {
    struct stat st;
    int ret = 0;
    ret = fstat(fd, &st);
    if (ret < 0)
        return ret;
    if (!S_ISREG(st.st_mode) && !S_ISDIR(st.st_mode)
        && !S_ISLNK(st.st_mode)) {
        errno = ENOTTY;
        return -1;
    }
    return ioctl(fd, FS_IOC_SETFLAGS, flags);
}

As an aside, I find it odd that the set flags ioctl takes a const int* rather than an int since I know of no CPU that has shorter pointers than integers.

Copying flags

Since filesystems have different capabilities, they unfortunately accept different sets of flags.

include/linux/fs.h has definitions for all the flags which are agreed on by every filesystem, though they may not support them.

Since you can't trust flags on two filesystems to mean the same thing, if they are on different filesystems then you must attempt to only set flags they both agree on.

include/linux/fs.h defines FS_FL_USER_MODIFIABLE for this.

Because filesystems may not implement every commonly defined flag and will refuse to set flags if any provided aren't recognised, you can either define logic for looking up the flags supported and setting those all at once, or try setting each flag in-turn so you can determine whether failing to copy that flag is a problem.

Since the kernel doesn't expose which flags a filesystem supports at runtime the set of flags your program thinks a filesystem supports can get out of date, so setting the flags one at a time is the most flexible option.

The code below uses ffs(3) to iterate through the bits set in the integer since C doesn't have an operator to do it, but ffs(3) may be a compiler builtin which uses special instructions.

int copy_flags(int srcfd, int tgtfd, int required_flags) {
    int ret;
    int srcflags;
    int tgtflags;
    int newflags;
    struct statfs srcfs, tgtfs;

    ret = get_flags(srcfd, &srcflags);
    if (ret != 0) {
        /* If we don't support flags we have none to update. */
        if (errno == EINVAL || errno == ENOTTY)
            return 0;
        return ret;
    }

    ret = get_flags(tgtfd, &tgtflags);
    if (ret != 0) {
        if (required_flags == 0 && (errno == EINVAL || errno == ENOTTY))
            return 0;
        return ret;
    }

    ret = fstatfs(srcfd, &srcfs);
    if (ret != 0)
        return ret;

    ret = fstatfs(tgtfd, &tgtfs);
    if (ret != 0)
        return ret;

    /* If on different fs need to mask to commonly agreed flags */
    if (srcfs.f_type != tgtfs.f_type) {
        srcflags &= FS_FL_USER_MODIFIABLE;
        tgtflags &= FS_FL_USER_MODIFIABLE;
        if ((srcflags & required_flags) != required_flags) {
            errno = EINVAL;
            return -1;
        }
    }

    /* Skip setting flags if they are the same */
    if (srcflags == tgtflags)
        return 0;

    /* Clear any flags that are set which we want to remove */
    newflags = tgtflags & srcflags;
    ret = set_flags(tgtfd, &newflags);
    if (ret != 0) {
        /* Can't set flags on the target, but we didn't require any. */
        if (required_flags == 0 && errno == EINVAL)
            return 0;
        return ret;
    }
    tgtflags = newflags;

    /* Use srcflags for flags we want to set,
       which are everything not already set. */
    srcflags &= ~tgtflags;
    while (srcflags) {
        int flag = 1 << (ffs(srcflags) - 1);

        newflags = tgtflags | flag;
        ret = set_flags(tgtfd, &newflags);
        /* Fail if this flag is required and unsettable */
        if (ret != 0 && (flag & required_flags))
            return ret;
        if (ret == 0)
            tgtflags = newflags;

        srcflags &= ~flag;
    }

    return 0;
}

Driving it

Since copying flags may or may not be a problem you need a way to decide, and, since that may depend on the context it's being called in, feedback may be more useful than a heuristic.

A command-line application could ask for confirmation of whether it's acceptable to not set a flag, but this is awkward for programs used in batch scripts and you may have already noticed the above code uses required_flags so the user can declare which flags they consider essential.

Bitwise or-ing flags together is an acceptable C API, but if it's from a command-line then that's not manageable.

We could do our own thing and name each flag, but chattr(1) exists for modifying a file's flags, and as awkward as it can be to remember the character to flag association, it's better to imitate something that users would be familiar with.

/* Convert a chattr style flags string into flags */
static int parse_flags(const char *flagstr) {
    static struct flags_char {
        int flag;
        char flagchar;
    } flags_chars[] = {
        { FS_SECRM_FL, 's' },
        { FS_UNRM_FL, 'u' },
        { FS_COMPR_FL, 'c' },
        { FS_SYNC_FL, 'S' },
        { FS_IMMUTABLE_FL, 'i' },
        { FS_APPEND_FL, 'a' },
        { FS_NODUMP_FL, 'd' },
        { FS_NOATIME_FL, 'A' },
        { FS_JOURNAL_DATA_FL, 'j' },
        { FS_NOTAIL_FL, 't' },
        { FS_DIRSYNC_FL, 'D' },
        { FS_TOPDIR_FL, 'T' },
#ifdef FS_EXTENTS_FL
        { FS_EXTENTS_FL, 'e'},
#endif
        { FS_NOCOW_FL, 'C' },
#ifdef FS_PROJINHERIT_FL
        { FS_PROJINHERIT_FL, 'P' },
#endif
    };
    int flags = 0;

    for (int i = 0; i < (sizeof flags_chars / sizeof *flags_chars); i++) {
        if (strchr(flagstr, flags_chars[i].flagchar))
            flags |= flags_chars[i].flag;
    }

    return flags;
}

As with previous articles, the full version of the my-mv.c source file and the Makefile may be downloaded.

Conclusion

Well that was more work than I expected, but we've copied all the metadata now right?

Well, no. It turns out 32 possible flags isn't enough.

Flags are compact and relatively easy to set, but 32 booleans is just not enough for everything you need, especially when some get reserved for other filesystems or are read-only.

A solution to this would be to just add more ioctls, but that would lead to the same problems with needing to know how to translate them between different filesystems.

What's needed is a unified API for setting this extra information, theseā€¦ extended attributes. We'll cover these next time.

Posted Wed Sep 28 11:00:06 2016 Tags: