Daniel Silverstone Enforcing your rights

We have spoken before about being a mensch, and about getting started with your project. In the latter we covered how you need to ensure that your project has a licence along with (ideally) a code of conduct or contributor covenant of some kind. Having these documents is, however, only the first step along the way of ensuring your users and contributors are protected.

Sadly if you're not on top of things then these documents are not worth the bits and bytes they're made up of. You must enforce such things in order for them to be useful. Fortunately if you pick a licence such as the GNU GPL then the Free Software Foundation carries some articles about enforcing the GPL. In addition, the Software Freedom Conservancy may be interested in helping you.

In addition, social contracts also need to be backed up with positive action in order to ensure that current and potential contributors can feel safe engaging with your project.

Your homework for today is to check over your projects again and ensure that they're all carrying appropriate and enforceable licences, and where appropriate social contracts, codes of conduct, or contributor covenants. Then once you've done that, take some time to check any community you might have around your projects for people failing to honour the codes of behaviour the project expects and deal with them politely but firmly. Finally take a bit of time to look around and see if you can see anyone violating your copyright by using your project in a way not covered by your licence. (For example, incorporating GPL code into something without offering the source).

Posted Wed Jun 1 11:00:07 2016 Tags:

We previously discussed common command-line formats.

If your program only operates on string values then this should be sufficient, but programs often need to operate with other data types.

Parsing numbers

Firstly, you might want a value which is a number rather than a string.

Historically the function for this was atoi(3) (ASCII to integer), which has the disadvantage of not having a way to distinguish between the string was "0", or the string wasn't a number.

A more useful, but less simple option is strtol(3), which lets you determine where the number ended by the endptr argument.

If *endptr == nptr then there was no number, and if *endptr != '\0' then there was extra characters after the number.

/* test.c */
#include <stdbool.h> /* bool */
#include <stdlib.h> /* strtol */
#include <getopt.h> /* getopt_long, struct option */
#include <stdio.h> /* fprintf */

int parse_options(int argc, char *argv[], long *foo_out) {
    enum opt {
        OPT_FOO = 'f',
        OPT_UNKNOWN = '?',
        OPT_NOVALUE = ':',
        OPT_END = -1,
    };
    static const struct option options[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {},
    };

    int ret = 0;
    long foo;
    bool parsed_foo = false;

    for (;;) {
        enum opt opt = getopt_long(argc, argv, "f:", options, NULL);
        switch(ret) {
        case OPT_FOO:
            {
                char *endptr;
                long foo;
                if (parsed_foo) {
                    fprintf(stderr, "%s: Only one --foo is permitted\n", argv[0]);
                    return 1;
                }
                foo = strtol(optarg, &endptr, 0);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --foo requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *foo_out = foo;
                parsed_foo = true;
            }
            break;
        case OPT_END:
            goto parsing_end;
        case OPT_NOVALUE:
        case OPT_UNKNOWN:
            return 1;
        }
    }
parsing_end:
    if (!parsed_foo) {
        fprintf(stderr, "%s: --foo is required\n", argv[0]);
        ret = 1;
    } 
    return ret;
}

int main(int argc, char *argv[]){
    long foo;
    int ret = parse_options(argc, argv, &foo);
    if (ret == 0)
        fprintf(stdout, "Foo is %ld\n", foo);
    return ret;
}
$ make test
cc    test.c   -o test
$ ./test 
./test: --foo is required
$ ./test --foo
./test: option '--foo' requires an argument
$ ./test --foo=asdf
./test: --foo requires a number, got asdf
$ ./test --foo=12
Foo is 12

Similarly there's strtoul(3), strtoll(3) and strtoull(3) for unsigned long, long long and unsigned long long integer types.

/* test.c */
#include <stdbool.h> /* bool */
#include <stdlib.h> /* strtol, strtoul, strtoll, strtoull */
#include <getopt.h> /* getopt_long, struct option */
#include <stdio.h> /* fprintf */

int parse_options(int argc, char *argv[], long *foo_out,
                  unsigned long *bar_out, long long *baz_out,
                  unsigned long long *qux_out) {
    enum opt {
        OPT_FOO = 'f',
        OPT_BAR = 'b',
        OPT_BAZ = 'B',
        OPT_QUX = 'q',
        OPT_UNKNOWN = '?',
        OPT_NOVALUE = ':',
        OPT_END = -1,
    };
    static const struct option options[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {.name = "bar", .has_arg = required_argument, .val = OPT_BAR},
        {.name = "baz", .has_arg = required_argument, .val = OPT_BAZ},
        {.name = "qux", .has_arg = required_argument, .val = OPT_QUX},
        {},
    };

    int ret = 0;
    long foo;
    bool parsed_foo = false;
    unsigned long bar;
    bool parsed_bar = false;
    long long baz;
    bool parsed_baz = false;
    unsigned long long qux;
    bool parsed_qux = false;

    for (;;) {
        enum opt opt = getopt_long(argc, argv, "f:b:B:q:", options, NULL);
        switch(ret) {
        case OPT_FOO:
            {
                char *endptr;
                long foo;
                if (parsed_foo) {
                    fprintf(stderr, "%s: Only one --foo is permitted\n", argv[0]);
                    return 1;
                }
                foo = strtol(optarg, &endptr, 0);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --foo requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *foo_out = foo;
                parsed_foo = true;
            }
            break;
        case OPT_BAR:
            {
                char *endptr;
                unsigned long bar;
                if (parsed_bar) {
                    fprintf(stderr, "%s: Only one --bar is permitted\n", argv[0]);
                    return 1;
                }
                bar = strtoul(optarg, &endptr, 0);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --bar requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *bar_out = bar;
                parsed_bar = true;
            }
            break;
        case OPT_BAZ:
            {
                char *endptr;
                long long baz;
                if (parsed_baz) {
                    fprintf(stderr, "%s: Only one --baz is permitted\n", argv[0]);
                    return 1;
                }
                baz = strtoll(optarg, &endptr, 0);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --baz requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *baz_out = baz;
                parsed_baz = true;
            }
            break;
        case OPT_QUX:
            {
                char *endptr;
                unsigned long long qux;
                if (parsed_qux) {
                    fprintf(stderr, "%s: Only one --qux is permitted\n", argv[0]);
                    return 1;
                }
                qux = strtoull(optarg, &endptr, 0);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --qux requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *qux_out = qux;
                parsed_qux = true;
            }
            break;
        case OPT_END:
            goto parsing_end;
        case OPT_NOVALUE:
        case OPT_UNKNOWN:
            return 1;
        }
    }
parsing_end:
    if (!parsed_foo) {
        fprintf(stderr, "%s: --foo is required\n", argv[0]);
        ret = 1;
    } 
    if (!parsed_bar) {
        fprintf(stderr, "%s: --bar is required\n", argv[0]);
        ret = 1;
    } 
    if (!parsed_baz) {
        fprintf(stderr, "%s: --baz is required\n", argv[0]);
        ret = 1;
    } 
    if (!parsed_qux) {
        fprintf(stderr, "%s: --qux is required\n", argv[0]);
        ret = 1;
    } 
    return ret;
}

int main(int argc, char *argv[]){
    long foo;
    unsigned long bar;
    long long baz;
    unsigned long long qux;
    int ret = parse_options(argc, argv, &foo, &bar, &baz, &qux);
    if (ret == 0)
        fprintf(stdout,
                "Foo is %ld\nBar is %lu\nBaz is %Ld\nQux is %Lu\n",
                foo, bar, baz, qux);
    return ret;
}
$ make test
cc    test.c   -o test
$ ./test 
./test: --foo is required
./test: --bar is required
./test: --baz is required
./test: --qux is required
$ ./test --foo=12 --bar=23 --baz=34 --qux=45
Foo is 12
Bar is 23
Baz is 34
Qux is 45

Finally, there's strtof(3), strtod(3) and strtold(3) for parsing float, double and long double.

/* test.c */
#include <stdbool.h> /* bool */
#include <stdlib.h> /* strtof, strtod, strtold */
#include <getopt.h> /* getopt_long, struct option */
#include <stdio.h> /* fprintf */

int parse_options(int argc, char *argv[], float *foo_out,
                  double *bar_out, long double *baz_out) {
    enum opt {
        OPT_FOO = 'f',
        OPT_BAR = 'b',
        OPT_BAZ = 'B',
        OPT_UNKNOWN = '?',
        OPT_NOVALUE = ':',
        OPT_END = -1,
    };
    static const struct option options[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {.name = "bar", .has_arg = required_argument, .val = OPT_BAR},
        {.name = "baz", .has_arg = required_argument, .val = OPT_BAZ},
        {},
    };

    int ret = 0;
    float foo;
    bool parsed_foo = false;
    double bar;
    bool parsed_bar = false;
    long double baz;
    bool parsed_baz = false;

    for (;;) {
        enum opt opt = getopt_long(argc, argv, "f:b:B:", options, NULL);
        switch(ret) {
        case OPT_FOO:
            {
                char *endptr;
                float foo;
                if (parsed_foo) {
                    fprintf(stderr, "%s: Only one --foo is permitted\n", argv[0]);
                    return 1;
                }
                foo = strtof(optarg, &endptr);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --foo requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *foo_out = foo;
                parsed_foo = true;
            }
            break;
        case OPT_BAR:
            {
                char *endptr;
                double bar;
                if (parsed_bar) {
                    fprintf(stderr, "%s: Only one --bar is permitted\n", argv[0]);
                    return 1;
                }
                bar = strtod(optarg, &endptr);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --bar requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *bar_out = bar;
                parsed_bar = true;
            }
            break;
        case OPT_BAZ:
            {
                char *endptr;
                long double baz;
                if (parsed_baz) {
                    fprintf(stderr, "%s: Only one --baz is permitted\n", argv[0]);
                    return 1;
                }
                baz = strtold(optarg, &endptr);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --baz requires a number, got %s\n",
                            argv[0], optarg);
                    return 1;
                }
                *baz_out = baz;
                parsed_baz = true;
            }
            break;
        case OPT_END:
            goto parsing_end;
        case OPT_NOVALUE:
        case OPT_UNKNOWN:
            return 1;
        }
    }
parsing_end:
    if (!parsed_foo) {
        fprintf(stderr, "%s: --foo is required\n", argv[0]);
        ret = 1;
    } 
    if (!parsed_bar) {
        fprintf(stderr, "%s: --bar is required\n", argv[0]);
        ret = 1;
    } 
    if (!parsed_baz) {
        fprintf(stderr, "%s: --baz is required\n", argv[0]);
        ret = 1;
    } 
    return ret;
}

int main(int argc, char *argv[]){
    float foo;
    double bar;
    long double baz;
    int ret = parse_options(argc, argv, &foo, &bar, &baz);
    if (ret == 0)
        fprintf(stdout,
                "Foo is %f\nBar is %lf\nBaz is %Lf\n",
                foo, bar, baz);
    return ret;
}
$ make test
cc    test.c   -o test
$ ./test 
./test: --foo is required
./test: --bar is required
./test: --baz is required
$ ./test --foo=1.2 --bar=2.3 --baz=3.4
Foo is 1.200000
Bar is 2.300000
Baz is 3.400000

Parsing arrays

We previously parsed the positional parameters array by making use of GNU getopt(3)'s permuting behaviour letting us use a slice of the argv array.

This is handy for a lot of programs, but some programs need to handle more than one array, or this array might not be of strings.

Parsing arrays of multiple options

/* test.c */
#include <stdlib.h> /* size_t, strtol */
#include <getopt.h> /* getopt_long, struct option */
#include <stdio.h> /* fprintf */

int extend_foo_array(long **foos, size_t *foos_count, long foo) {
    size_t newsize = (*foos_count + 1) * sizeof(foo);
    long *newfoos = realloc(*foos, newsize);
    
    if (newfoos == NULL)
        return 1;
    
    newfoos[*foos_count] = foo;
    
    (*foos_count)++;
    *foos = newfoos;
    return 0;
}

int parse_options(int argc, char *argv[], long **foos_out, size_t *foos_count_out) {
    enum opt {
        OPT_FOO = 'f',
        OPT_UNKNOWN = '?',
        OPT_NOVALUE = ':',
        OPT_END = -1,
    };
    static const struct option options[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {},
    };

    int ret = 0;
    long *foos = NULL;
    size_t foos_count = 0;

    for (;;) {
        enum opt opt = getopt_long(argc, argv, "f:", options, NULL);
        switch(opt) {
        case OPT_FOO:
            {
                char *endptr;
                long foo = strtol(optarg, &endptr, 0);
                if (endptr == optarg || *endptr != '\0') {
                    fprintf(stderr, "%s: --foo requires a number, got %s\n",
                            argv[0], optarg);
                    ret = 1;
                    goto cleanup;
                }
                if (extend_foo_array(&foos, &foos_count, foo) != 0) {
                    fprintf(stderr, "%s: Unable to extend foo array\n", argv[0]);
                    ret = 2;
                    goto cleanup;
                }
            }
            break;
        case OPT_END:
            goto parsing_end;
        case OPT_NOVALUE:
        case OPT_UNKNOWN:
            ret = 1;
            goto cleanup;
        }
    }
parsing_end:
    if (foos == NULL || foos_count == 0) {
        fprintf(stderr, "%s: At least one --foo required\n", argv[0]);
        ret = 1;
    } else {
        *foos_out = foos;
        *foos_count_out = foos_count;
        foos = NULL;
        foos_count = 0;
    }
cleanup:
    free(foos);
    return ret;
}

int main(int argc, char *argv[]) {
    long *foos = NULL;
    size_t foos_count = 0;
    int ret = parse_options(argc, argv, &foos, &foos_count);
    if (ret == 0) {
        fprintf(stdout, "Foos:\n");
        for (int i = 0; i < foos_count; i++) {
            fprintf(stdout, "%d:\t%ld\n", i, foos[i]);
        }
    }
cleanup:
    free(foos);
    return ret;
}
$ ./test
./test: At least one --foo required
$ ./test -f
./test: option requires an argument -- 'f'
$ ./test -f1 -f2
Foos:
0:  1
1:  2

Parsing arrays of token separated values

The multiple option form of arrays is convenient when your values may be arbitrary strings, though it is more typing and it is a bit unnatural to create an array this way when there is a convenient token separator.

Comma and colon are the traditional favourites for this, and this is sufficiently common that there are functions in glibc to help.

strtok(3) is the traditional function for this, though it relies on global state, so is unfavourable.

strsep(3) was the BSD approach to fix this, which has a reasonably nice API, but strtok_r(3) is the standardised non-global-state version.

/* test.c */
#include <stdlib.h> /* size_t, strtol */
#include <getopt.h> /* getopt_long, struct option */
#include <stdio.h> /* fprintf */
#include <string.h> /* strtok_r */

int extend_foo_array(long **foos, size_t *foos_count, long foo) {
    size_t newsize = (*foos_count + 1) * sizeof(foo);
    long *newfoos = realloc(*foos, newsize);
    
    if (newfoos == NULL)
        return 1;
    
    newfoos[*foos_count] = foo;
    
    (*foos_count)++;
    *foos = newfoos;
    return 0;
}

int parse_options(int argc, char *argv[], long **foos_out, size_t *foos_count_out) {
    enum opt {
        OPT_FOO = 'f',
        OPT_UNKNOWN = '?',
        OPT_NOVALUE = ':',
        OPT_END = -1,
    };
    static const struct option options[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {},
    };

    int ret = 0;
    long *foos = NULL;
    size_t foos_count = 0;

    for (;;) {
        enum opt opt = getopt_long(argc, argv, "f:", options, NULL);
        switch(opt) {
        case OPT_FOO:
            {
                char *str = optarg;
                char *token;
                if (foos != NULL || foos_count != 0) {
                    fprintf(stderr, "%s: Only one --foo is permitted\n", argv[0]);
                    ret = 1;
                    goto cleanup;
                }
                while ((token = strtok_r(str, ":", &str)) != NULL) {
                    char *endptr;
                    long foo = strtol(token, &endptr, 0);
                    if (endptr == token || *endptr != '\0') {
                        fprintf(stderr, "%s: --foo requires a : separated array"
                                        " of numbers, got %s\n",
                                argv[0], token);
                        ret = 1;
                        goto cleanup;
                    }
                    if (extend_foo_array(&foos, &foos_count, foo) != 0) {
                        fprintf(stderr, "%s: Unable to extend foo array\n",
                                argv[0]);
                        ret = 2;
                        goto cleanup;
                    }
                }
            }
            break;
        case OPT_END:
            goto parsing_end;
        case OPT_NOVALUE:
        case OPT_UNKNOWN:
            ret = 1;
            goto cleanup;
        }
    }
parsing_end:
    if (foos == NULL || foos_count == 0) {
        fprintf(stderr, "%s: At least one --foo required\n", argv[0]);
        ret = 1;
    } else {
        *foos_out = foos;
        *foos_count_out = foos_count;
        foos = NULL;
        foos_count = 0;
    }
cleanup:
    free(foos);
    return ret;
}

int main(int argc, char *argv[]) {
    long *foos = NULL;
    size_t foos_count = 0;
    int ret = parse_options(argc, argv, &foos, &foos_count);
    if (ret == 0) {
        fprintf(stdout, "Foos:\n");
        for (int i = 0; i < foos_count; i++) {
            fprintf(stdout, "%d:\t%ld\n", i, foos[i]);
        }
    }
cleanup:
    free(foos);
    return ret;
}
$ ./test 
./test: At least one --foo required
$ ./test -f1,2,3
./test: --foo requires a : separated array of numbers, got 1,2,3
$ ./test -f1:2:3
Foos:
0:  1
1:  2
2:  3
$ ./test -f1:2:3 -f123
./test: Only one --foo is permitted

Parsing suboptions

Some values may be compound, such as complex numbers that have a real and imaginary part, or any non-trivial C struct.

It is possible to parse the values as an array and fill in the data structure from the indices, but this gets complicated when the values may be of different types or optional.

So it would be convenient to be able to supply these with key-value pairs of field name and value.

If your keys don't have = in them and you don't have ,s in your keys or values, then getsubopt(3), an apparently unholy union between getopt(3) and strtok_r(3) could be just what you're looking for!

/* test.c */
#include <stdbool.h> /* bool */
#include <stdlib.h> /* size_t, strtol */
#include <getopt.h> /* getopt_long, struct option */
#include <stdio.h> /* fprintf */
#include <string.h> /* getsubopt */

struct foo {
    long bar;
    char *baz;
};

int extend_foo_array(struct foo **foos, size_t *foos_count, struct foo *foo) {
    size_t newsize = (*foos_count + 1) * sizeof(*foo);
    struct foo *newfoos = realloc(*foos, newsize);
    
    if (newfoos == NULL)
        return 1;
    
    newfoos[*foos_count] = *foo;
    
    (*foos_count)++;
    *foos = newfoos;
    return 0;
}

int parse_foo(char *progname, char *optarg, struct foo *foo_out) {
    enum foo_opt {
        FOO_BAR,
        FOO_BAZ,
    };
    static char *const foo_tokens[] = {
        [FOO_BAR] = "bar",
        [FOO_BAZ] = "baz",
        NULL,
    };

    struct foo foo;
    bool parsed_foo_bar = false;
    bool parsed_foo_baz = false;
    
    while (*optarg != '\0') {
        char *value;
        enum foo_opt foo_opt;
        int ret = getsubopt(&optarg, foo_tokens, &value);
        if (ret == -1) {
            return 1;
        }
        foo_opt = ret;
        switch (foo_opt) {
        case FOO_BAR:
            {
                char *endptr;
                if (parsed_foo_bar) {
                    fprintf(stderr, "%s: Only one --foo=bar=VALUE "
                            "is permitted\n", progname);
                    return 1;
                }
                long bar = strtol(value, &endptr, 0);
                if (endptr == value || *endptr != '\0') {
                    fprintf(stderr, "%s: --foo=bar=VALUE requires a number, "
                            "got %s\n", progname, value);
                    return 1;
                }
                foo.bar = bar;
                parsed_foo_bar = true;
            }
            break;
        case FOO_BAZ:
            {
                char *endptr;
                if (parsed_foo_baz) {
                    fprintf(stderr, "%s: Only one --foo=baz=VALUE "
                            "is permitted\n", progname);
                    return 1;
                }
                foo.baz = value;
                parsed_foo_baz = true;
            }
            break;
        }
    }

    if (!parsed_foo_bar)
        fprintf(stderr, "%s: Missing bar=VALUE in --foo\n", progname);
    if (!parsed_foo_baz)
        fprintf(stderr, "%s: Missing baz=VALUE in --foo\n", progname);
    if (parsed_foo_bar && parsed_foo_baz) {
        *foo_out = foo;
        return 0;
    }
    return 1;
}

int parse_options(int argc, char *argv[], struct foo **foos_out, size_t *foos_count_out) {
    enum opt {
        OPT_FOO = 'f',
        OPT_UNKNOWN = '?',
        OPT_NOVALUE = ':',
        OPT_END = -1,
    };
    static const struct option options[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {},
    };

    int ret = 0;
    struct foo *foos = NULL;
    size_t foos_count = 0;

    for (;;) {
        enum opt opt = getopt_long(argc, argv, "f:", options, NULL);
        switch(opt) {
        case OPT_FOO:
            {
                char *str = optarg;
                char *token;
                while ((token = strtok_r(str, ":", &str)) != NULL) {
                    char *endptr;
                    struct foo foo;
                    if (parse_foo(argv[0], token, &foo)) {
                        fprintf(stderr, "%s: --foo requires a , separated array"
                                        " of key-value pairs, got %s\n",
                                argv[0], token);
                        ret = 1;
                        goto cleanup;
                    }
                    if (extend_foo_array(&foos, &foos_count, &foo) != 0) {
                        fprintf(stderr, "%s: Unable to extend foo array\n",
                                argv[0]);
                        ret = 2;
                        goto cleanup;
                    }
                }
            }
            break;
        case OPT_END:
            goto parsing_end;
        case OPT_NOVALUE:
        case OPT_UNKNOWN:
            ret = 1;
            goto cleanup;
        }
    }
parsing_end:
    if (foos == NULL || foos_count == 0) {
        fprintf(stderr, "%s: At least one --foo required\n", argv[0]);
        ret = 1;
    } else {
        *foos_out = foos;
        *foos_count_out = foos_count;
        foos = NULL;
        foos_count = 0;
    }
cleanup:
    free(foos);
    return ret;
}

int main(int argc, char *argv[]) {
    struct foo *foos = NULL;
    size_t foos_count = 0;
    int ret = parse_options(argc, argv, &foos, &foos_count);
    if (ret == 0) {
        fprintf(stdout, "Foos:\n");
        for (int i = 0; i < foos_count; i++) {
            fprintf(stdout, "%d:\tbar=%ld, baz=%s\n", i, foos[i].bar, foos[i].baz);
        }
    }
cleanup:
    free(foos);
    return ret;
}
$ ./test
./test: At least one --foo required
$ ./test --foo=bar=1
./test: Missing baz=VALUE in --foo
./test: --foo requires a , separated array of key-value pairs, got bar=1
$ ./test --foo=baz=qux
./test: Missing bar=VALUE in --foo
./test: --foo requires a , separated array of key-value pairs, got baz=qux
$ ./test --foo=bar=1.2,baz=qux
./test: --foo=bar=VALUE requires a number, got 1.2
./test: --foo requires a , separated array of key-value pairs, got bar=1.2
$ ./test --foo=bar=1,baz=qux --foo=bar=2,baz=quux
Foos:
0:  bar=1, baz=qux
1:  bar=2, baz=quux

Conclusion

As you can see, there are plenty of functions built-into the C library designed to make it easier to perform the kind of string parsing that is necessary to parse command-line arguments.

Unfortunately if you resort to token-separated values, such as strtok_r(3) and getsubopt(3), you then can't handle strings that contain those characters,

Your homework this week is to take a look at extract_first_word so you can understand the context of why it exists.

Posted Wed Jun 8 11:00:06 2016 Tags:
Daniel Silverstone Inputting complex characters

While those of us who live in anglophone countries are blessed with having our characterset be the default for modern computing; there are plenty of others not so lucky and while we have keyboard layouts which allow many of them to type letters such as é, ç, ø, and ł, for those of us who do not have those characters on our keyboards there are a number of ways for us to enter them.

If you use a GTK+ based system then the default input method supports direct entry of Unicode codepoints by means of holding Control, Shift, pressing u and then typing the codepoint in, in hexadecimal before releasing the chording keys. For example, C-S-u 266b produces . In addition, if you're using an application which supports the X11 Compose key, then there are composition sequences for many characters which can be accessed by pressing the Compose key and then the sequence of characters which comprise the composition sequence. For example, Compose C = produces .

In addition to composition sequences, some keyboard layouts support what are called dead keys, sometimes in alternative shift levels on the keyboard accessed via a level shifting key. A keyboard with a dead ' can produce an é by means of pressing ' and then e.

Finally, if you want to enter other kinds of characters such as 한국어 then you will need a more complex input method. There exist a number but the more commonly encountered ones are uim and fcitx. There are plenty of tutorials for setting up uim or fcitx (or one of the others) in your desktop environment if you search the interwebs. These input methods are special because they often require the ability to enter incomplete characters to prompt you for further input and as such they break the basic rule of one keypress produces one character (though that was already bent with the compose and dead keys).

Your homework is to delve into the keyboard settings on your system, find out where your compose key is, and play with composition sequences (you can find examples in /usr/share/X11/locale/en_US.UTF-8/Compose or a similar location depending on your chosen locale). The composition sequences often also list the dead key combinations so have a good explore and learn how to type all sorts of characters you might previously have gone to a character map application for.

Posted Wed Jun 15 11:00:07 2016

I previously spoke about command-line parsing with getopt and mentioned an alternative called argp.

Using argp is convenient because it automatically generates --help and --usage, your help text won't get out of sync with your options; and argp also combines the long option specification with the short options, so your short options won't get out of sync with your long options.

Using argp does require you to restructure your argument parsing though, and while the argpbook is a good guide to learn how to write new programs, if you're already familiar with command-line parsing in general a lot of what it has to say is redundant.

So this article is about how to translate a program written to use getopt into a program that uses argp.

Converting programs that parse with getopt to use argp

This is a relatively simple program, that reports the positional arguments and the value passed to the --foo option.

/* test0.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt_long, struct option */

int main(int argc, char *argv[]){
    enum opt {
        OPT_END = -1,
        OPT_FOO = 'f',
        OPT_NOFOO = 0x100,
        OPT_UNEXPECTED = '?',
    };
    static const struct option longopts[] = {
        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
        {.name = "also-foo", .has_arg = required_argument, .val = OPT_FOO},
        {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
        {},
    };
    char **positionals;
    char *foo = NULL;
    for (;;) {
        int longindex = -1;
        enum opt opt = getopt_long(argc, argv, "f:", longopts, &longindex);
        switch (opt) {
        case OPT_END:
            goto end_optparse;
        case OPT_FOO:
            foo = optarg;
            break;
        case OPT_NOFOO:
            foo = NULL;
            break;
        case OPT_UNEXPECTED:
            return 1;
        }
    }
end_optparse:
    positionals = &argv[optind];

    if (foo == NULL) {
        fprintf(stdout, "Got no Foo\n");
    } else {
        fprintf(stdout, "Foo is %s\n", foo);
    }
    for (; *positionals; positionals++)
        fprintf(stdout, "Positional: %s\n", *positionals);
    return 0;
}

Fixing control flow

The control flow for getopt_long is different to argp.

The getopt_long is called in a loop until it has finished, effectively acting as a form of iterator, while argp is called once, passing it a callback function.

Returning parsed arguments in a struct

A side-effect of this change is that we need to change how we store our results, since we only get to pass one pointer to the parse function, we need to have a state structure.

--- test0.c    2016-05-30 11:58:27.799321266 +0100
+++ test1.c    2016-05-30 12:01:06.533529250 +0100
@@ -1,7 +1,12 @@
-/* test0.c */
+/* test1.c */
 #include <stdio.h> /* fprintf */
 #include <getopt.h> /* getopt_long, struct option */
 
+struct arguments {
+    char *foo;
+    char **positionals;
+};
+
 int main(int argc, char *argv[]){
     enum opt {
         OPT_END = -1,
@@ -15,8 +20,9 @@
         {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
         {},
     };
-    char **positionals;
-    char *foo = NULL;
+    struct arguments args = {
+        .foo = NULL,
+    };
     for (;;) {
         int longindex = -1;
         enum opt opt = getopt_long(argc, argv, "f:", longopts, &longindex);
@@ -24,24 +30,24 @@
         case OPT_END:
             goto end_optparse;
         case OPT_FOO:
-            foo = optarg;
+            args.foo = optarg;
             break;
         case OPT_NOFOO:
-            foo = NULL;
+            args.foo = NULL;
             break;
         case OPT_UNEXPECTED:
             return 1;
         }
     }
 end_optparse:
-    positionals = &argv[optind];
+    args.positionals = &argv[optind];
 
-    if (foo == NULL) {
+    if (args.foo == NULL) {
         fprintf(stdout, "Got no Foo\n");
     } else {
-        fprintf(stdout, "Foo is %s\n", foo);
+        fprintf(stdout, "Foo is %s\n", args.foo);
     }
-    for (; *positionals; positionals++)
+    for (char **positionals = args.positionals; *positionals; positionals++)
         fprintf(stdout, "Positional: %s\n", *positionals);
     return 0;
 }

Adding a handler function

To make the switch-over to calling argp_parse easier, we're going to split out the argument parsing into a function, while calls getopt_long in a loop, and calls a second function to actually handle the argument.

--- test1.c    2016-05-30 12:01:06.533529250 +0100
+++ test2.c    2016-05-30 13:35:19.414248340 +0100
@@ -1,4 +1,4 @@
-/* test1.c */
+/* test2.c */
 #include <stdio.h> /* fprintf */
 #include <getopt.h> /* getopt_long, struct option */
 
@@ -7,40 +7,59 @@
     char **positionals;
 };
 
+enum opt {
+    OPT_END = -1,
+    OPT_FOO = 'f',
+    OPT_NOFOO = 0x100,
+    OPT_UNEXPECTED = '?',
+};
+static const struct option longopts[] = {
+    {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
+    {.name = "also-foo", .has_arg = required_argument, .val = OPT_FOO},
+    {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
+    {},
+};
+const char optstring[] = "f:";
+
+int parse_arg(int opt, char *arg, struct arguments *args){
+    switch (opt) {
+    case OPT_FOO:
+        args->foo = arg;
+        return 0;
+    case OPT_NOFOO:
+        args->foo = NULL;
+        return 0;
+    default:
+        return 1;
+    }
+}
+
+int parse_args(int argc, char *argv[], struct arguments *args){
+    for (;;) {
+        int opt = getopt_long(argc, argv, optstring, longopts, NULL);
+
+        if (opt == OPT_END) {
+            args->positionals = &argv[optind];
+            return 0;
+        }
+
+        int ret = parse_arg(opt, optarg, args);
+        if (ret != 0) {
+            return ret;
+        }
+    }
+}
+
 int main(int argc, char *argv[]){
-    enum opt {
-        OPT_END = -1,
-        OPT_FOO = 'f',
-        OPT_NOFOO = 0x100,
-        OPT_UNEXPECTED = '?',
-    };
-    static const struct option longopts[] = {
-        {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
-        {.name = "also-foo", .has_arg = required_argument, .val = OPT_FOO},
-        {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
-        {},
-    };
+    int ret = 0;
     struct arguments args = {
         .foo = NULL,
     };
-    for (;;) {
-        int longindex = -1;
-        enum opt opt = getopt_long(argc, argv, "f:", longopts, &longindex);
-        switch (opt) {
-        case OPT_END:
-            goto end_optparse;
-        case OPT_FOO:
-            args.foo = optarg;
-            break;
-        case OPT_NOFOO:
-            args.foo = NULL;
-            break;
-        case OPT_UNEXPECTED:
-            return 1;
-        }
+
+    ret = parse_args(argc, argv, &args);
+    if (ret != 0) {
+        return ret;
     }
-end_optparse:
-    args.positionals = &argv[optind];
 
     if (args.foo == NULL) {
         fprintf(stdout, "Got no Foo\n");

You may have noticed the inconsistency from some parameters being passed in to the handler function and some being globals.

This is a side-effect of emulating the API that argp exposes, with minimal changes to the flow of data.

Handling positional parameters as options

argp parser functions typically handle parsing the positional arguments, rather than the caller.

Unfortunately we don't currently pass the argv in to the handler function, so we'll need to change the API a little to meet that, by adding a struct parse_state that includes the argv.

--- test2.c    2016-05-30 13:42:20.638208254 +0100
+++ test3.c    2016-05-30 13:45:52.232178791 +0100
@@ -1,4 +1,4 @@
-/* test2.c */
+/* test3.c */
 #include <stdio.h> /* fprintf */
 #include <getopt.h> /* getopt_long, struct option */
 
@@ -21,7 +21,13 @@
 };
 const char optstring[] = "f:";
 
-int parse_arg(int opt, char *arg, struct arguments *args){
+struct parse_state {
+    char **argv;
+    struct arguments *input;
+};
+
+int parse_arg(int opt, char *arg, struct parse_state *state){
+    struct arguments *args = state->input;   
     switch (opt) {
     case OPT_FOO:
         args->foo = arg;
@@ -29,24 +35,29 @@
     case OPT_NOFOO:
         args->foo = NULL;
         return 0;
+    case OPT_END:
+        args->positionals = &state->argv[optind];
+        return 0;
     default:
         return 1;
     }
 }
 
 int parse_args(int argc, char *argv[], struct arguments *args){
+    struct parse_state state = {
+        .argv = argv,
+        .input = args,
+    };
     for (;;) {
         int opt = getopt_long(argc, argv, optstring, longopts, NULL);
-
-        if (opt == OPT_END) {
-            args->positionals = &argv[optind];
-            return 0;
-        }
-
-        int ret = parse_arg(opt, optarg, args);
+        int ret = parse_arg(opt, optarg, &state);
         if (ret != 0) {
             return ret;
         }
+
+        if (opt == OPT_END) {
+            return 0;
+        }
     }
 }

Switching over to argp_parse

Now that we've changed the logic flow, we can effectively substitute parse_args for argp_parse.

The result is now mostly deleting code we added to change the logic flow.

Replacing parse_args

-
-int parse_args(int argc, char *argv[], struct arguments *args){
-    struct parse_state state = {
-        .argv = argv,
-        .input = args,
-    };
-    for (;;) {
-        int opt = getopt_long(argc, argv, optstring, longopts, NULL);
-        int ret = parse_arg(opt, optarg, &state);
-        if (ret != 0) {
-            return ret;
-        }
-
-        if (opt == -1) {
-            return 0;
-        }
-    }
-}
 
 int main(int argc, char *argv[]){
+    static const struct argp argp = {
+        .options = opts,
+        .parser = parse_arg,
+    };
     int ret = 0;
     struct arguments args = {
         .foo = NULL,
     };
 
-    ret = parse_args(argc, argv, &args);
+    ret = argp_parse(&argp, argc, argv, 0, NULL, &args);
     if (ret != 0) {
         return ret;
     }

This effectively replaces the code we had for parsing how we wanted with a call to argp_parse with appropriate configuration.

The static const struct argp argp is in main just to keep its definition local to its only user.

Strictly the static const struct argp_option opts[] could also be moved here, but it's easier to compare how options are specified if it's changed in its current location rather than moved.

Changing the options vector

@@ -8,25 +10,17 @@
 };
 
 enum opt {
-    OPT_END = -1,
     OPT_FOO = 'f',
     OPT_NOFOO = 0x100,
-    OPT_UNEXPECTED = '?',
 };
-static const struct option longopts[] = {
-    {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
-    {.name = "also-foo", .has_arg = required_argument, .val = OPT_FOO},
-    {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
+static const struct argp_option opts[] = {
+    {.name = "foo", .key = OPT_FOO, .arg = "value"},
+    {.name = "also-foo", .key = OPT_FOO, .flags = OPTION_ALIAS},
+    {.name = "no-foo", .key = OPT_NOFOO},
     {},
 };
-const char optstring[] = "f:";

Because argp_parse handles termination and unexpected options internally we don't need OPT_END or OPT_UNEXPECTED any more.

struct argp_option has similar behaviour to struct option, but it does not have a .flag parameter, so the .val equivalent is called .key and is used to determine which value to pass through to parse_arg.

Rather than having .has_arg defining whether it takes values, the .arg field defines whether it expects a value, and labels it in the help output.

If an option's value is optional, then add OPTION_ARG_OPTIONAL to .flags.

Because argp_parse treats any key which is printable as a short option, we don't need the separate option string.

Changes to parse_arg

-int parse_arg(int opt, char *arg, struct parse_state *state){
+error_t parse_arg(int opt, char *arg, struct argp_state *state){
     struct arguments *args = state->input;   
     switch (opt) {
     case OPT_FOO:
@@ -35,39 +29,26 @@
     case OPT_NOFOO:
         args->foo = NULL;
         return 0;
-    case OPT_END:
-        args->positionals = &state->argv[optind];
+    case ARGP_KEY_ARGS:
+    case ARGP_KEY_NO_ARGS:
+        args->positionals = &state->argv[state->next];
         return 0;
     default:
-        return 1;
+        return ARGP_ERR_UNKNOWN;
     }
 }

This is mostly the same.

The function signature has changed slightly since we pass argp's state instead, and rather than using optind, we use state->next.

argp parser functions can handle arguments individually with ARGP_KEY_ARG or them all together as ARGP_KEY_ARGS, and can handle being given no arguments with ARGP_KEY_NO_ARGS.

Since we want to treat all subsequent arguments as the positionals, we wouldn't do this by handling ARGP_KEY_ARG, since then we'd need to pick the arguments individually.

We need to handle ARGP_KEY_NO_ARGS since we haven't initialised args->positionals to anything, and to be a valid argument vector we need to point to something even if it is just a pointer to a NULL (signifying an empty vector).

Since &state->argv[state->next] points to the end of the array if there were no positional parameters, or to the next parameter if there was one, the code is actually the same.

argp parser functions may be chained together, so a parser function that doesn't recognise a particular option should return ARGP_ERR_UNKNOWN so that argp_parse can either try a different parser function or it can report it being unhandled as an error.

The full diff

--- test3.c    2016-05-30 13:47:01.431515079 +0100
+++ test4.c    2016-05-30 14:14:25.315137258 +0100
@@ -1,6 +1,8 @@
-/* test3.c */
+/* test4.c */
 #include <stdio.h> /* fprintf */
-#include <getopt.h> /* getopt_long, struct option */
+#include <argp.h> /* argp_parse, error_t, struct argp, struct argp_option,
+                     struct argp_state, OPTION_ALIAS,
+                     ARGP_KEY_ARGS, ARGP_KEY_NO_ARGS, ARGP_ERR_UNKNOWN */
 
 struct arguments {
     char *foo;
@@ -8,25 +10,17 @@
 };
 
 enum opt {
-    OPT_END = -1,
     OPT_FOO = 'f',
     OPT_NOFOO = 0x100,
-    OPT_UNEXPECTED = '?',
 };
-static const struct option longopts[] = {
-    {.name = "foo", .has_arg = required_argument, .val = OPT_FOO},
-    {.name = "also-foo", .has_arg = required_argument, .val = OPT_FOO},
-    {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
+static const struct argp_option opts[] = {
+    {.name = "foo", .key = OPT_FOO, .arg = "value"},
+    {.name = "also-foo", .key = OPT_FOO, .flags = OPTION_ALIAS},
+    {.name = "no-foo", .key = OPT_NOFOO},
     {},
 };
-const char optstring[] = "f:";
 
-struct parse_state {
-    char **argv;
-    struct arguments *input;
-};
-
-int parse_arg(int opt, char *arg, struct parse_state *state){
+error_t parse_arg(int opt, char *arg, struct argp_state *state){
     struct arguments *args = state->input;   
     switch (opt) {
     case OPT_FOO:
@@ -35,39 +29,26 @@
     case OPT_NOFOO:
         args->foo = NULL;
         return 0;
-    case OPT_END:
-        args->positionals = &state->argv[optind];
+    case ARGP_KEY_ARGS:
+    case ARGP_KEY_NO_ARGS:
+        args->positionals = &state->argv[state->next];
         return 0;
     default:
-        return 1;
+        return ARGP_ERR_UNKNOWN;
     }
 }
-
-int parse_args(int argc, char *argv[], struct arguments *args){
-    struct parse_state state = {
-        .argv = argv,
-        .input = args,
-    };
-    for (;;) {
-        int opt = getopt_long(argc, argv, optstring, longopts, NULL);
-        int ret = parse_arg(opt, optarg, &state);
-        if (ret != 0) {
-            return ret;
-        }
-
-        if (opt == -1) {
-            return 0;
-        }
-    }
-}
 
 int main(int argc, char *argv[]){
+    static const struct argp argp = {
+        .options = opts,
+        .parser = parse_arg,
+    };
     int ret = 0;
     struct arguments args = {
         .foo = NULL,
     };
 
-    ret = parse_args(argc, argv, &args);
+    ret = argp_parse(&argp, argc, argv, 0, NULL, &args);
     if (ret != 0) {
         return ret;
     }

Now we can see the fruits of our labour:

$ make test4
cc     test4.c   -o test4
$ ./test4 --help
Usage: test4 [OPTION...]

  -f, --foo=value, --also-foo=value
      --no-foo
  -?, --help                 Give this help list
      --usage                Give a short usage message

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
Posted Wed Jun 22 11:00:07 2016 Tags:
Lars Wirzenius Writing documentation

You know what it's like to not understand how some silly software needs to be used. Ideally, it should be obvious, but in practice, you'll want to read at least some documentation, if only to have some general understanding of what the software is for.

Sometimes you yourself will have to write documentation or other prose about your software. It might not be user-oriented documentation; it might be a project proposal, an architecture overview, or an incident report on how you dealt with a security problem in the code.

All hackers (programmers, software developers, system administrators) need to write prose. It's inevitable, so it makes sense to learn how to do it without too much pain. This article gives a brief introduction to the topic, from the assumption that you're a programmer.


You're a programmer. You should probably approach writing as you would programming.

Overview of documentation development

Although all the details are different, writing as a process can be similar to programming.

  1. Define acceptance criteria and scope.
  2. Specify relevant user personas and use cases.
  3. Write a prototype document.
  4. Get your text reviewed and user-tested.
  5. Make changes based on review feedback.
  6. Iterate until good enough.

Acceptance criteria and scope

You should decide what your document is for. What should it cover? What does it not need to cover? When is the document good enough that you can stop writing it?

These are similar to the question you'd decide on when writing some software. The difference between a writer and a programmer is that the writer finds it easier to answer these questions about a document and a programmer about software.

As an example, this blog post itself is meant to cover basics of writing documentation. It won't go into a deep discussion into tooling. It's done when I think it's going to be useful to read by someone who dreads writing any prose at all.

User personas and use cases

Just as for software, documentation has target users. In writing, this is often expressed as "target audience". Who are the people you want to read the documentation? The sullen answer of "don't know, don't care, don't wanna, not gonna" isn't useful here. If you really aren't going to write anything, then you can stop reading. If you really are going to write, you need at least a rudimentary answer.

In addition, use cases can also be relevant for documentation. For example, a manual for a program might have use cases such as "how to start the program", "how to configure the program", and "what does this error message mean?".

A prototype document

A software prototype is a rough sketch of what the program might do when it's finished. The first prototype document should be an outline, perhaps with a paragraph of text for each chapter or section. When someone reads it, they get an idea of what the finished document will be like.

Later document prototypes will add more text. There might be many iterations, and each might fill in a chapter or two.

Get your text reviewed and user-tested

Code review is quite efficient at finding problems. Text review is even more efficient. User-testing is also useful for documentation: you get people to read your text and ask them questions to see if they understood things correctly.

If you do many iterations, you'll want to find new readers every now and then. Fresh eyes are better.

Make changes based on review feedback

This is a no-brainer.

Actually, no, it isn't. Sometimes feedback is ambiguous, or different people give conflicting feedback. Or they're utterly, totally wrong about semicolons; they're not the evil of our times, after all.

You do your best. The next iteration will show if you made things better or not.

Sometimes feedback is soul-crushing. People may say unpleasant things about the thing you've spent days or weeks or months to make. Same thing happens with code, of course. If you can, try to see if they have a kernel of useful truth that you can extract and make use of, but otherwise just ignore those people in the future.

Iterate until good enough.

With code, iteration is fairly straightforward. With text, you probably want to iterate fewer times to avoid exhausting your readers.

However, you should always feel free to treat each iteration as a draft, with the intention of making the next one better than the current one.

Random points about writing

Writing text is much less painful if you are good at typing. Touch typing really helps. It helps with code as well, but not nearly as much.

It may help you to use your usual tools: your text editor, version control system, etc, instead of a WYSIWYG word processor. Familiarity helps, and it helps not getting angry at all the ways in which modern word processors get in your way by trying to be helpful. I'm being opinionated here: if you like using a word processor, by all means do.

Don't worry about layout, typography, pagination, etc. Don't even worry about length, since you're writing digital content, and are not bound by limitations of the physical world. Instead, worry about the structure of your document.

Don't worry about writing perfect prose. It's useful to use a spelling checker, but you can do that after you've got the text done otherwise. Don't worry about using fine or fancy language, as you might find in some world-class literature; it's perfectly OK, and often preferably, to only write simple sentences.

You're writing software documentation. Don't worry about style, unless you want to. ABC is the goal: accuracy, brevity, and clarity. You can add humour, if you want to, but that's a squirrel, sorry, personal choice. Completely optional. If you're overcoming a reluctance to write, it's important that you write, not that you entertain.

You may benefit from the Cory Doctorow Method of writing. After you have an outline, write a draft of one section. If your sections are of reasonably small scope, it might take you half an hour per section, which you can do in one sitting. Repeat every day, and in what feels like no time at all, you'll have a big manual.

Posted Wed Jun 29 11:00:06 2016 Tags: