We previously discussed common command-line formats.

This uniformity is only possible because we have a common specification with existing libraries providing helper functions for this.

Short options

Use getopt(3) in a loop until it returns -1, which means that there are no more arguments to parse.

If the current argument matches an option, the option character is returned.

The neatest way to handle parsed arguments is to use a switch block.

optind is the index of the next argument after the current option in argv. If you want to rewind the command-line parser you can set optind back to the index of argv you want to parse, so set it to 1 to rewind back to the beginning.

/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */

int main(int argc, char *argv[]){
    unsigned verbosity = 0;
    for (;;) {
        int opt = getopt(argc, argv, "v");
        if (opt == -1)
            break;
        switch (opt) {
        case 'v':
            verbosity++;
            break;
        default:
            /* Unexpected option */
            return 1;
        }
    }
    fprintf(stdout, "Verbosity level: %u\n", verbosity);
    return 0;
}
$ make test
$ ./test
Verbosity level: 0
$ ./test -v
Verbosity level: 1
$ ./test -v -v
Verbosity level: 2
$ ./test -vvv
Verbosity level: 3

Option Values

If there is a : after the option character, then that option has a corresponding value.

Rather than using argv[optind] to get the value, it can be found in optarg.

This is because your value can be in the same argument as the option, so optarg points to the substring at the end of the argument. If it has the value in a separate argument optarg points to that argument and optind points to the argument after it.

/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */

int main(int argc, char *argv[]){
    for (;;) {
        int opt = getopt(argc, argv, "o:");
        if (opt == -1)
            break;
        switch (opt) {
        case 'o':
            fprintf(stdout, "Got option: %s\n", optarg);
            break;
        default:
            /* Unexpected option */
            return 1;
        }
    }
    return 0;
}
$ make test
$ ./test -oo
Got option: o
$ ./test -o value
Got option: value
$ ./test -ovalue -oo
Got option: value
Got option: o

Error handling

For convenience getopt(3) will print error messages by default, when it also returns '?' to signify an unrecognised argument.

To take responsibility for your own error handling, set opterr to 0. The option in question is stored in optopt.

Note that when you do this, you probably also want to opt in to handling missing values.

To do this start the optstring argument with a :. This makes getopt(3) return ':' when there is a missing value. As before the option with the missing value is in optopt.

/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */

int main(int argc, char *argv[]){
    opterr = 0;
    for (;;) {
        int opt = getopt(argc, argv, ":o:");
        if (opt == -1)
            break;
        switch (opt) {
        case '?':
            fprintf(stderr, "%s: Unexpected option: %c\n", argv[0], optopt);
            return 1;
        case ':':
            fprintf(stderr, "%s: Missing value for: %c\n", argv[0], optopt);
            return 1;
        case 'o':
            fprintf(stdout, "Got option: %s\n", optarg);
            break;
        }
    }
    return 0;
}

Positional arguments

After getopt(3) returns -1 optind points to the argument after the last option.

If you had no more options this points to the NULL after your arguments, so argv[optind] == NULL.

If you did have more arguments they start at optind. This is convenient if you have a function that takes a string array, as you can pass it on directly as argv + optind or &argv[optind].

/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */

int main(int argc, char *argv[]){
    char **positionals;
    for (;;) {
        int opt = getopt(argc, argv, "o:");
        if (opt == -1)
            break;
        switch (opt) {
        case 'o':
            fprintf(stdout, "Got option: %s\n", optarg);
            break;
        default:
            /* Unexpected option */
            return 1;
        }
    }
    positionals = &argv[optind];
    for (; *positionals; positionals++)
        fprintf(stdout, "Positional: %s\n", *positionals);
    return 0;
}
$ make test
$ ./test a -oo b
Got option: o
Positional: a
Positional: b

As mentioned previously, the default GNU behaviour is to scan every argument, and place all non-options at the end.

This can be very convenient, since it allows you to more easily change the options for a command by adding things at the end.

For example, suppose you ran foo -bar baz qux, but it didn't seem to do anything, but foo has a -v option to turn on verbose mode, so rather than having to scroll back through your command history, navigate to before the baz and insert a -v, you can put the -v at the end as foo -bar baz qux -v, where your shell's cursor will already be.

If you prefer the POSIX behaviour of stopping at the first non-option, as mentioned previously you can set the POSIXLY_CORRECT environment variable. This is inconvenient and affects any subcommands your program may run though, so getopt(3) lets you change the behaviour with flags in optstring.

The accepted flag characters are - or +, which are placed at the beginning of optstring (before the : if you have one).

+ makes getopt(3) behave like POSIXLY_CORRECT is set.

$ sed -i 's/getopt(.*)/getopt(argc, argv, "+o:)/' test.c
$ make test
$ ./test -oo a -oo b
Got option: o
Positional: a
Positional: -oo
Positional: b

- suppresses argument permutation and stopping at the first non-option, so getopt(3) scans the whole argv.

With this, optind points to the end of the argv after parsing ends, so we can't use the same trick with positional arguments.

Long options

So now we know how to parse short options, but these require a good memory or frequent checking of documentation.

So we have long options starting with --.

To parse these we use getopt_long(3) (surprise! it's the same man page).

The key difference when compared with getopt(3) is that we pass an array of new structures in.

This is terminated by an empty structure rather than passing the length.

If you want a long option that is an alias for a short option, use the character of the short option as the .val.

/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */

int main(int argc, char *argv[]){
    static const struct option longopts[] = {
        {.name = "foo", .has_arg = no_argument, .val = 'f'},
        {.name = "bar", .has_arg = no_argument, .val = 'b'},
        {},
    };
    for (;;) {
        int opt = getopt_long(argc, argv, "bf", longopts, NULL);
        if (opt == -1)
            break;
        switch (opt) {
        case 'f':
            fprintf(stdout, "Got foo\n");
            break;
        case 'b':
            fprintf(stdout, "Got bar\n");
            break;
        default:
            /* Unexpected option */
            return 1;
        }
    }
    return 0;
}
$ make test
$ ./test -fb
Got foo
Got bar
$ ./test --foo --bar
Got foo
Got bar

If you want a long option which is an alias of another option, just set the same .val for both.

To work out which name it used, provide a pointer to longindex, and look up which long option matched in the longopts array.

/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */

int main(int argc, char *argv[]){
    static const struct option longopts[] = {
        {.name = "foo", .has_arg = no_argument, .val = 'f'},
        {.name = "also-foo", .has_arg = no_argument, .val = 'f'},
        {},
    };
    for (;;) {
        int longindex = -1;
        int opt = getopt_long(argc, argv, "f", longopts, &longindex);
        if (opt == -1)
            break;
        switch (opt) {
        case 'f':
            if (longindex == -1)
                fprintf(stdout, "Got -%c\n", opt);
            else
                fprintf(stdout, "Got --%s\n", longopts[longindex].name);
            break;
        default:
            /* Unexpected option */
            return 1;
        }
    }
    return 0;
}
$ make test
$ ./test -f
Got -f
$ ./test --foo
Got --foo
$ ./test --also-foo
Got --also-foo

If you want a long only option, just pick a .val not in optstring. If you choose a value higher than 255 then it can't possibly be a short option.

/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */

int main(int argc, char *argv[]){
    static const struct option longopts[] = {
        {.name = "foo", .has_arg = no_argument, .val = 0x100},
        {},
    };
    for (;;) {
        int opt = getopt_long(argc, argv, "", longopts, NULL);
        if (opt == -1)
            break;
        switch (opt) {
        case 0x100:
            fprintf(stdout, "Got foo\n");
            break;
        default:
            /* Unexpected option */
            return 1;
        }
    }
    return 0;
}
$ make test
$ ./test --foo
Got foo
$ ./test -f
$ echo $?
1

If you want a long only option which just sets a variable to a value, set .flag to the address of the variable, and .val to the value to set.

/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */

int main(int argc, char *argv[]){
    int foo = -1;
    const struct option longopts[] = {
        {.name = "foo", .has_arg = no_argument, .flag = &foo, .val = 1},
        {.name = "no-foo", .has_arg = no_argument, .flag = &foo, .val = 0},
        {},
    };
    for (;;) {
        int opt = getopt_long(argc, argv, "", longopts, NULL);
        if (opt == -1)
            break;
        if (opt != 0) {
            /* Unexpected option */
            return 1;
        }
    }
    if (foo == 1)
        fprintf(stdout, "Got foo\n");
    if (foo == 0)
        fprintf(stdout, "Got no-foo\n");
    if (foo == -1)
        fprintf(stdout, "foo is unset\n");
    return 0;
}
$ make test
$ ./test
foo is unset
$ ./test --foo
Got foo
$ ./test --no-foo
Got no-foo

Using enums

I recommend using an enum for the options your command-line accepts since:

  1. It makes your switch block easier to read, since you can spell out what the option is, rather than having to guess from the short option character.
  2. Add -1, ':' and '?' to the enum, or check for those before the switch block and coerce the type to the enum, and if you compile with -Werror=switch then gcc will let you know if you forgot to handle an option.

For options which have a short option alias, assign it to the character value. For long only options arbitrarily pick one as the first, set the enum's integer value to 0x100 (256), and list the remaining long only options after that

/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */

int main(int argc, char *argv[]){
    enum opt {
        OPT_END = -1,
        OPT_FOO = 'f',
        OPT_NOFOO = 0x100,
        OPT_UNEXPECTED = '?',
    };
    static const struct option longopts[] = {
        {.name = "foo", .has_arg = no_argument, .val = OPT_FOO},
        {.name = "also-foo", .has_arg = no_argument, .val = OPT_FOO},
        {.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
        {},
    };
    for (;;) {
        int longindex = -1;
        enum opt opt = getopt_long(argc, argv, "f", longopts, &longindex);
        switch (opt) {
        case OPT_END:
            goto end_optparse;
        case OPT_FOO:
            fprintf(stdout, "Got Foo\n");
            break;
        case OPT_NOFOO:
            fprintf(stdout, "Got no Foo\n");
            break;
        case OPT_UNEXPECTED:
            return 1;
        }
    }
end_optparse:
    return 0;
}
$ make CFLAGS=-Werror=switch test
$ ./test --foo --also-foo --no-foo
Got Foo
Got Foo
Got no Foo

Disadvantages

  1. The getopt API is old and ugly, involving lots of global variables and global state, which makes it difficult to re-use for string array parsing in general, rather than just the command-line arguments.
  2. It is not possible to have options with more than 1 value. This has led to command-line APIs that turn this single value into multiple by separating sub-arguments with a comma. There is getsubopt(3) and strtok(3) to help parse this, but they don't have a way to escape the separator character which means it's only suitable for values that can't contain a comma. Unfortunately it gets used for file paths which may contain commas, so there are some broken programs out there as a result.
  3. The short options are interpreted as a sequence of individual bytes rather than characters. This means it only works for single-byte characters, which means most non-english characters can't be used.
  4. There's redundancy when declaring options both in the optstring and the longopts. It's trivial for options to become out of sync.

Conclusion

This article is already pretty long, so discussing how to argument value parsing can wait for a future article.

If this article has interested you and you want to set yourself some exercises, your homework is to:

  1. If you have any C programs, see if you can simplify your argument parsing, or make it behave like standard command-line parsers with getopt(3).
  2. Research alternatives like argp_parse(3). It's been available in glibc since 1997, and is available in gnulib for non-GNU POSIX platforms, but the only documentation is on the GNU website, or the argpbook.