By now you should be familiar with running shell commands.
If you are fortunate, you should have noticed a pattern in how you provide them.
The general themes are:
Short options of a single character after a hyphen,
Possibly followed by a second argument as a value to the argument, or the value to the option immediately following.
So
-ofoo
is equivalent to-o foo
.Multiple short options that don't take a value argument grouped into the same command.
So
-abc
and-a -b -c
are equivalent.
When combining short option groups and short options with value arguments, you can provide as many no-value options in the same command as you like but the string after the first option with a value becomes the value for that option.
So
-vvofoo
is equivalent to-v -v -o foo
, and if you want to provide multiple-o
options, the shortest way to express that is-vvofoo -obar
.Long options starting with two hyphens.
- Value-less options such as
--foo
, often paired with a--no-foo
. - Options with values,
either with the value as a separate command-line argument,
or separated by an equals,
so
--foo=bar
is equivalent to--foo bar
.
- Value-less options such as
Any number of positional arguments after any number of option arguments.
Traditional posix behaviour has the first non-option argument being the last option with the rest being interpreted as positional arguments.
So with
--foo bar baz --qux
is not equivalent to--foo bar --qux baz
.The usual GNU behaviour is to scan all the options to look for options and letting you intersperse positional arguments with optional.
So
baz --foo bar
is equivalent to--foo bar baz
.
You can force posix behaviour by setting the
POSIXLY_CORRECT
environment variable.You can force arguments to be positional with the magic
--
argument, so--foo bar -- baz --qux
is not equivalent to--foo bar baz --qux
.If you are writing shell scripts
--
is very useful, since if you are providing values from external sources you should use--
to prevent it interpreting your positional arguments as extra options which may change the behaviour.Try to remember this, otherwise if you have a file called
-f
in a directory,alias rm='rm -i'
won't prevent you from accidentally removing things if you runrm *
instead ofrm -- *
.Arguments split into comma-separated sub-options.
You may have noticed that few tools accept multiple values per option, the traditional work-around for this is to provide only one value argument, but split that up with comma (
,
) or colon (:
) characters.If it is comma separated, it may also be sub-option
KEY[=VALUE]
pairs.If you are unfortunate, the values may be file paths, which is a problem, because file paths may contain colons.
Event driven programming is a style of programming which essentially boils down to the concept of a program being a consumer of a stream of events to which the programmer creates code to react. This style of programming is very popular in software which has to interact with a user directly since it very readily lends itself to user interface programming.
Commonly such programs consist of some amount of setup, then an event loop which dispatches events. What is interesting is that once you get used to writing software in an event driven fashion then you start to find such event loops in the most unexpected of places. A little while ago, we talked about state machines and they are useful tools for managing event driven programs because they are inherently an event processing model.
Your homework for today is to go and look at projects you have and consider if they have event loops in them, and if so, look at whether or not you've state machines handling those event streams. Once you've done that, have a look around in other software and try and identify event loops in different kinds of software. To get your started, most UI toolkits are event-driven, as is the Minecraft mod called Computercraft.
How many people use the Linux kernel? Nobody knows exactly, but it's on the order of billions, thanks to Android and a very large number of embedded devices.
How many people use the Firefox browser? Again, nobody knows exactly, but there's some numbers that can guide you in guessing: download numbers from the Firefox home site, and aggregating user-agent statistics from a large number of popular websites.
How many people use the Koha integrated library system? It much less popular than Linux or Firefox, but it's also something whose usership is somewhat restricted: it gets run by libraries, and you could find a list of libraries (public and private) and ask them if they use Koha and if they do, how many patrons they have. This would probably give you an order-of-magnitude estimate.
How many people use your own software? It's an interesting question for most free software developers, but it's not an easy one to answer. Free software can be shared freely, and there's no requirement for users to register (if there were, that would be a limitation on their freedom and privacy). As outline above, in some cases it's possible to find some actual numbers to use as a guide for estimation. In each case, the numbers come from different sources, depending on the type of software.
A few generic ways to get those numbers exist:
You can get the software packaged in Debian or Ubuntu or another operating system that has something similar to Debian's popcon. This is an opt-in, voluntary system that gathers anonymous usage data about software installations. It provides a rough lower bound for users of software, but anything more becomes guesswork, since only an unknown, but small fraction of Debian installations participate.
You can measure the number of downloads, bug reports, and other interactions with your users.
You can construct search engine queries that find mentions of your software, and see how many hits you get.
You can run user surveys, and see how many responses you get.
You should ask yourself, why does the actual number of users matter? Does it affect your livelihood? Does it make you feel better about yourself? Does it help you justify all the time you put into the software?
Ultimately, you'll have to resign to not know.
We previously discussed common command-line formats.
This uniformity is only possible because we have a common specification with existing libraries providing helper functions for this.
Short options
Use getopt(3) in a loop until it returns -1
,
which means that there are no more arguments to parse.
If the current argument matches an option, the option character is returned.
The neatest way to handle parsed arguments is to use a switch block.
optind
is the index of the next argument after the current option in argv
.
If you want to rewind the command-line parser
you can set optind
back to the index of argv
you want to parse,
so set it to 1 to rewind back to the beginning.
/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */
int main(int argc, char *argv[]){
unsigned verbosity = 0;
for (;;) {
int opt = getopt(argc, argv, "v");
if (opt == -1)
break;
switch (opt) {
case 'v':
verbosity++;
break;
default:
/* Unexpected option */
return 1;
}
}
fprintf(stdout, "Verbosity level: %u\n", verbosity);
return 0;
}
$ make test
$ ./test
Verbosity level: 0
$ ./test -v
Verbosity level: 1
$ ./test -v -v
Verbosity level: 2
$ ./test -vvv
Verbosity level: 3
Option Values
If there is a :
after the option character,
then that option has a corresponding value.
Rather than using argv[optind]
to get the value,
it can be found in optarg
.
This is because your value can be in the same argument as the option,
so optarg
points to the substring at the end of the argument.
If it has the value in a separate argument optarg
points to that argument
and optind
points to the argument after it.
/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */
int main(int argc, char *argv[]){
for (;;) {
int opt = getopt(argc, argv, "o:");
if (opt == -1)
break;
switch (opt) {
case 'o':
fprintf(stdout, "Got option: %s\n", optarg);
break;
default:
/* Unexpected option */
return 1;
}
}
return 0;
}
$ make test
$ ./test -oo
Got option: o
$ ./test -o value
Got option: value
$ ./test -ovalue -oo
Got option: value
Got option: o
Error handling
For convenience getopt(3) will print error messages by default,
when it also returns '?'
to signify an unrecognised argument.
To take responsibility for your own error handling, set opterr
to 0
.
The option in question is stored in optopt
.
Note that when you do this, you probably also want to opt in to handling missing values.
To do this start the optstring
argument with a :
.
This makes getopt(3) return ':'
when there is a missing value.
As before the option with the missing value is in optopt
.
/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */
int main(int argc, char *argv[]){
opterr = 0;
for (;;) {
int opt = getopt(argc, argv, ":o:");
if (opt == -1)
break;
switch (opt) {
case '?':
fprintf(stderr, "%s: Unexpected option: %c\n", argv[0], optopt);
return 1;
case ':':
fprintf(stderr, "%s: Missing value for: %c\n", argv[0], optopt);
return 1;
case 'o':
fprintf(stdout, "Got option: %s\n", optarg);
break;
}
}
return 0;
}
Positional arguments
After getopt(3) returns -1
optind
points to the argument after the last option.
If you had no more options this points to the NULL
after your arguments,
so argv[optind] == NULL
.
If you did have more arguments they start at optind
.
This is convenient if you have a function that takes a string array,
as you can pass it on directly as argv + optind
or &argv[optind]
.
/* test.c */
#include <stdio.h> /* fprintf */
#include <unistd.h> /* getopt */
int main(int argc, char *argv[]){
char **positionals;
for (;;) {
int opt = getopt(argc, argv, "o:");
if (opt == -1)
break;
switch (opt) {
case 'o':
fprintf(stdout, "Got option: %s\n", optarg);
break;
default:
/* Unexpected option */
return 1;
}
}
positionals = &argv[optind];
for (; *positionals; positionals++)
fprintf(stdout, "Positional: %s\n", *positionals);
return 0;
}
$ make test
$ ./test a -oo b
Got option: o
Positional: a
Positional: b
As mentioned previously, the default GNU behaviour is to scan every argument, and place all non-options at the end.
This can be very convenient, since it allows you to more easily change the options for a command by adding things at the end.
For example, suppose you ran foo -bar baz qux
,
but it didn't seem to do anything,
but foo
has a -v
option to turn on verbose mode,
so rather than having to scroll back through your command history,
navigate to before the baz
and insert a -v
,
you can put the -v
at the end as foo -bar baz qux -v
,
where your shell's cursor will already be.
If you prefer the POSIX behaviour of stopping at the first non-option,
as mentioned previously you can set the POSIXLY_CORRECT
environment variable.
This is inconvenient and affects any subcommands your program may run though,
so getopt(3) lets you change the behaviour with flags in optstring
.
The accepted flag characters are -
or +
,
which are placed at the beginning of optstring
(before the :
if you have one).
+
makes getopt(3) behave like POSIXLY_CORRECT
is set.
$ sed -i 's/getopt(.*)/getopt(argc, argv, "+o:)/' test.c
$ make test
$ ./test -oo a -oo b
Got option: o
Positional: a
Positional: -oo
Positional: b
-
suppresses argument permutation and stopping at the first non-option,
so getopt(3) scans the whole argv
.
With this, optind
points to the end of the argv
after parsing ends,
so we can't use the same trick with positional arguments.
Long options
So now we know how to parse short options, but these require a good memory or frequent checking of documentation.
So we have long options starting with --
.
To parse these we use getopt_long(3) (surprise! it's the same man page).
The key difference when compared with getopt(3) is that we pass an array of new structures in.
This is terminated by an empty structure rather than passing the length.
If you want a long option that is an alias for a short option,
use the character of the short option as the .val
.
/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */
int main(int argc, char *argv[]){
static const struct option longopts[] = {
{.name = "foo", .has_arg = no_argument, .val = 'f'},
{.name = "bar", .has_arg = no_argument, .val = 'b'},
{},
};
for (;;) {
int opt = getopt_long(argc, argv, "bf", longopts, NULL);
if (opt == -1)
break;
switch (opt) {
case 'f':
fprintf(stdout, "Got foo\n");
break;
case 'b':
fprintf(stdout, "Got bar\n");
break;
default:
/* Unexpected option */
return 1;
}
}
return 0;
}
$ make test
$ ./test -fb
Got foo
Got bar
$ ./test --foo --bar
Got foo
Got bar
If you want a long option which is an alias of another option,
just set the same .val
for both.
To work out which name it used, provide a pointer to longindex
,
and look up which long option matched in the longopts
array.
/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */
int main(int argc, char *argv[]){
static const struct option longopts[] = {
{.name = "foo", .has_arg = no_argument, .val = 'f'},
{.name = "also-foo", .has_arg = no_argument, .val = 'f'},
{},
};
for (;;) {
int longindex = -1;
int opt = getopt_long(argc, argv, "f", longopts, &longindex);
if (opt == -1)
break;
switch (opt) {
case 'f':
if (longindex == -1)
fprintf(stdout, "Got -%c\n", opt);
else
fprintf(stdout, "Got --%s\n", longopts[longindex].name);
break;
default:
/* Unexpected option */
return 1;
}
}
return 0;
}
$ make test
$ ./test -f
Got -f
$ ./test --foo
Got --foo
$ ./test --also-foo
Got --also-foo
If you want a long only option, just pick a .val
not in optstring
.
If you choose a value higher than 255 then it can't possibly be a short option.
/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */
int main(int argc, char *argv[]){
static const struct option longopts[] = {
{.name = "foo", .has_arg = no_argument, .val = 0x100},
{},
};
for (;;) {
int opt = getopt_long(argc, argv, "", longopts, NULL);
if (opt == -1)
break;
switch (opt) {
case 0x100:
fprintf(stdout, "Got foo\n");
break;
default:
/* Unexpected option */
return 1;
}
}
return 0;
}
$ make test
$ ./test --foo
Got foo
$ ./test -f
$ echo $?
1
If you want a long only option which just sets a variable to a value,
set .flag
to the address of the variable, and .val
to the value to set.
/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */
int main(int argc, char *argv[]){
int foo = -1;
const struct option longopts[] = {
{.name = "foo", .has_arg = no_argument, .flag = &foo, .val = 1},
{.name = "no-foo", .has_arg = no_argument, .flag = &foo, .val = 0},
{},
};
for (;;) {
int opt = getopt_long(argc, argv, "", longopts, NULL);
if (opt == -1)
break;
if (opt != 0) {
/* Unexpected option */
return 1;
}
}
if (foo == 1)
fprintf(stdout, "Got foo\n");
if (foo == 0)
fprintf(stdout, "Got no-foo\n");
if (foo == -1)
fprintf(stdout, "foo is unset\n");
return 0;
}
$ make test
$ ./test
foo is unset
$ ./test --foo
Got foo
$ ./test --no-foo
Got no-foo
Using enums
I recommend using an enum for the options your command-line accepts since:
- It makes your switch block easier to read, since you can spell out what the option is, rather than having to guess from the short option character.
- Add
-1
,':'
and'?'
to the enum, or check for those before the switch block and coerce the type to the enum, and if you compile with-Werror=switch
thengcc
will let you know if you forgot to handle an option.
For options which have a short option alias, assign it to the character value.
For long only options arbitrarily pick one as the first,
set the enum's integer value to 0x100
(256),
and list the remaining long only options after that
/* test.c */
#include <stdio.h> /* fprintf */
#include <getopt.h> /* getopt */
int main(int argc, char *argv[]){
enum opt {
OPT_END = -1,
OPT_FOO = 'f',
OPT_NOFOO = 0x100,
OPT_UNEXPECTED = '?',
};
static const struct option longopts[] = {
{.name = "foo", .has_arg = no_argument, .val = OPT_FOO},
{.name = "also-foo", .has_arg = no_argument, .val = OPT_FOO},
{.name = "no-foo", .has_arg = no_argument, .val = OPT_NOFOO},
{},
};
for (;;) {
int longindex = -1;
enum opt opt = getopt_long(argc, argv, "f", longopts, &longindex);
switch (opt) {
case OPT_END:
goto end_optparse;
case OPT_FOO:
fprintf(stdout, "Got Foo\n");
break;
case OPT_NOFOO:
fprintf(stdout, "Got no Foo\n");
break;
case OPT_UNEXPECTED:
return 1;
}
}
end_optparse:
return 0;
}
$ make CFLAGS=-Werror=switch test
$ ./test --foo --also-foo --no-foo
Got Foo
Got Foo
Got no Foo
Disadvantages
- The getopt API is old and ugly, involving lots of global variables and global state, which makes it difficult to re-use for string array parsing in general, rather than just the command-line arguments.
- It is not possible to have options with more than 1 value. This has led to command-line APIs that turn this single value into multiple by separating sub-arguments with a comma. There is getsubopt(3) and strtok(3) to help parse this, but they don't have a way to escape the separator character which means it's only suitable for values that can't contain a comma. Unfortunately it gets used for file paths which may contain commas, so there are some broken programs out there as a result.
- The short options are interpreted as a sequence of individual bytes rather than characters. This means it only works for single-byte characters, which means most non-english characters can't be used.
- There's redundancy when declaring options
both in the
optstring
and thelongopts
. It's trivial for options to become out of sync.
Conclusion
This article is already pretty long, so discussing how to argument value parsing can wait for a future article.
If this article has interested you and you want to set yourself some exercises, your homework is to:
- If you have any C programs, see if you can simplify your argument parsing, or make it behave like standard command-line parsers with getopt(3).
- Research alternatives like argp_parse(3). It's been available in glibc since 1997, and is available in gnulib for non-GNU POSIX platforms, but the only documentation is on the GNU website, or the argpbook.