This article introduces the GNU Debugger (GDB).
Here we have code to build a list.
#include <stdio.h>
#include <stdlib.h>
struct node {
int element;
struct node *next;
};
struct node *append(struct node *l, int x)
{
struct node *n = malloc(sizeof (n));
n->element = x;
n->next = NULL;
if (l == NULL)
l = n;
else {
while (l != NULL)
l = l->next;
l->next = n;
}
return l;
}
int main(void)
{
struct node *l = append(NULL, 0);
append(l, 1);
append(l, 2);
return EXIT_SUCCESS;
}
% gcc -o list list.c
% ./list
Segmentation fault
GDB can help you find the cause of a segfault, add -g
to your compile
command and re-run. -g
tells gcc to emit debugging information. The
debugging information is used by gdb.
% gcc -o list list.c -g
% gdb ./list
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/richardipsum/Desktop/list...done.
(gdb) run
Starting program: /home/richardipsum/Desktop/list
Program received signal SIGSEGV, Segmentation fault.
0x000000000040056a in append (l=0x0, x=1) at list.c:21
21 l->next = n;
GDB shows us where the segfault occurs.
Breakpoints
GDB lets the user set a break point, this is a point in the code where execution can be stopped, which allows the user to examine the state of the program's objects. Afterward the user can resume execution.
Here we set a breakpoint at line 13, just after we assign our element to its node.
% gcc -o list list.c -g
% gdb ./list
% ...
% ...
% (gdb) break 13
Breakpoint 1 at 0x400532: file list.c, line 13.
(gdb) run
Starting program: /home/richardipsum/Desktop/list
Breakpoint 1, append (l=0x0, x=0) at list.c:13
13 n->next = NULL;
At this point n->element
should have been assigned the value of x
(0),
to confirm this we can use gdb's print command.
% (gdb) print n->element
% $1 = 0
To resume execution we use the continue command.
(gdb) continue
Continuing.
Breakpoint 1, append (l=0x601010, x=1) at list.c:13
13 n->next = NULL;
Notice that l
now has a value of 0x601010
, our first call to append
appended the node with element 0 to the empty list and
returned a pointer to a node located at address 0x601010
,
the beginning of the list l
. Our second call to append will append the node
with element 1 to l
.
Stepping
Another useful command is the step
command, this allows the user to
examine execution step by step.
Note that if you hit return without entering a command gdb will assume the command entered previously, this makes stepping very convenient.
(gdb) step
15 if (l == NULL)
(gdb)
18 while (l != NULL)
(gdb)
19 l = l->next;
(gdb)
18 while (l != NULL)
(gdb)
21 l->next = n;
(gdb)
Program received signal SIGSEGV, Segmentation fault.
0x000000000040056a in append (l=0x0, x=1) at list.c:21
21 l->next = n;
(gdb) print l
$1 = (struct node *) 0x0
l
is NULL, dereferencing NULL is undefined and causes a segfault on the
computer running this code.
GDB is a powerful tool, it can do a lot more than has been shown here, for more information see gdb online docs.
Whitespace is the set of blank characters, commonly defined as space, tab, newline and possibly carriage return.
Its significance in shell scripts is that command line arguments are separated by whitespace, unless the arguments are quoted.
For illustration, we have a shell script called printargs
, which writes
each of the arguments we give it on a new line.
$ install /dev/stdin printargs <<'EOF'
#!/bin/sh
for arg; do
echo "$arg"
done
EOF
$ ./printargs foo bar baz
foo
bar
baz
You can see that it behaves as expected for the case of simple, one word strings for the arguments.
However, we want to print the string hello world
. If we were to write
it in as normal, it would print hello and world on different lines.
$ ./printargs hello world
hello
world
This is where quoting comes in, if you surround a string with quotation
marks, i.e. '
or "
, then it is treated as a single argument.
$ ./printargs "hello world"
hello world
$ ./printargs 'hello world'
hello world
Alternatively, special characters can be escaped with a \
(backslash).
$ ./printargs hello\ world
hello world
However, this looks ugly.
Similarly if you wanted to put a "
in a string that was quoted with
double quotes you could escape it them, or use the other quoting style.
$ ./printargs "hello \"material\" world"
hello "material" world
$ ./printargs 'hello "material" world'
hello "material" world
$ ./printargs "hello 'material' world"
hello 'material' world
The equivalent for '
is very ugly, since the only thing that terminates
a singly quoted sequence is a single quote, escaping is not permitted.
$ ./printargs 'hello '\''material'\'' world'
hello 'material' world
Having read that, you may wonder how people make whitespace errors in shell commands, but it becomes less obvious when variables are involved.
$ var="hello \"material\" world"
$ ./printargs $var
hello
"material"
world
This goes wrong because $var
is expanded in the command line, and
looks like ./printargs hello \"material\" world
to your shell.
This can be prevented by quoting the variable substitution.
$ ./printargs "$var"
hello "material" world
You may wonder why the shell behaves this way. It's mostly historical, since that's how shells have always done it, and it's kept that way for backwards compatibility, though some, like zsh, break with backwards compatibility in favour of a more sensible default.
It does occasionally come in useful, when strings aren't whitespace sensitive.
$ names="what who why"
$ for name in $names; do
echo my name is $name
done; \
echo Slim shady
my name is what
my name is who
my name is why
Slim shady
However, if you're dealing with filenames, this is entirely inappropriate.
$ mkdir temp
$ cd temp
$ touch "foo bar" baz
$ for fn in `find . -type f`; do rm "$fn"; done
rm: cannot remove `./foo': No such file or directory
rm: cannot remove `bar': No such file or directory
$ ls
foo bar
Admittedly, this example is a little contrived, but it can be the difference between cleaning up your temporary files and deleting your music collection.
$ ls ~
temp music.mp3
$ ls -1 ~/temp
not music.mp3
scrap
$ cd ~
$ for fn in `find ~/temp -type f`; do rm "$f"; done
rm: cannot remove `~/temp/not': No such file or directory
$ ls
temp
There are a few ways this could have been avoided.
- Using arrays
- Process the files directly with find
- Have find pass which files to process on to another program which handles whitespace better
- Handle whitespace yourself
Using arrays
I mentioned $*
and $@
when I first talked about shell. These are
used for expanding variables, either the command line arguments directly,
or other array variables as ${array[@]}
.
They behave identically unless quoted with "
, in which case $@
splits
each argument into a different word, while $*
becomes a single string,
with each argument separated by a space.
Behold!
$ set -- "foo bar" baz qux
$ ./printargs $@
foo
bar
baz
qux
$ ./printargs "$@"
foo bar
baz
qux
$ ./printargs "$*"
foo bar baz qux
The previous example used set --
to use the command line argument
array, since every shell has one. If you are using bash, you can have
other arrays.
$ array=( "foo bar" baz qux )
$ ./printargs "${array[@]}"
foo bar
baz
qux
$ array+=(badger)
$ ./printargs "${array[@]}"
foo bar
baz
qux
badger
Glob expressions work in arrays too, so you can have an array of all files in a directory.
$ toremove=( ~/temp/* )
$ rm "${toremove[@]}"
Unfortunately, there is not a built-in way of recursively reading the contents of a directory into an array, so one of the later techniques will be required in conjunction.
$ declare -a toremove
$ while read f; do toremove+=("$f"); done < <(find ~/temp -type f)
$ rm "${toremove[@]}"
Removing directly with find
Find can remove the files itself if you use the -delete
option. It
would be used like this:
$ find ~/temp -type f -delete
Though this is specific to deleting the file. We may instead want to do
something else, like ensure it can't be executed. This could be done with
chmod a-x
and find's -exec
option.
$ find ~/temp -type f -exec chmod a-x {} +
Removing a file is similarly achieved, just by using rm
instead of
chmod
.
$ find ~/temp -type f -exec rm {} +
More complicated operations are possible with an inline shell command. The following makes a backup of all the files as their name with .bak suffixed.
$ find ~/temp -type f -exec sh -c 'cp "$1" "$1.bak"' - {} \;
This is pretty ugly for anything complicated, difficult to remember, and requires you to remember how to invoke a command in a subshell the hard way.
For more details on how this works, refer to the find(1) man page,
looking for the alternative form of -exec
, and the bash(1) man page
for what -c
means and how to pass extra arguments to a shell command.
Passing commands to xargs
If you have a deficient find command, or want to minimise the number of
commands run, you can use xargs
.
This can be used similarly to the previous find commands:
$ find ~/temp -type f | xargs rm
For people with a wide knowledge of shell programming (i.e. knows lots of commands and how to put them together, rather than all the options of each command) this is more readable, however it is not yet whitespace safe, since find separates its arguments with a newline character, so if you had a filename with a newline in it, it would get it wrong.
This can be solved by find's -print0
argument and xargs' -0
argument.
$ find ~/temp -type f -print0 | xargs -0 rm
What this will do is instead of separating file paths with newline characters, it will separate them with NUL bytes (i.e. all bits 0). Since NUL is not a valid character in filenames, it cannot misinterpret the strings.
This is if anything, slightly more ugly than using -exec
with find
for renaming files.
$ find ~/temp -type f -print0 | xargs -0 -n 1 sh -c 'cp "$1" "$1.bak"' -
Handling whitespace yourself
xargs
is not the only command that can be used to handle input. If
you're using bash, you can do it in shell directly with read -d
.
$ find ~/temp -type f -print0 | while read -d $'\0' fn; do
cp "$fn" "$fn.bak"
done
read
is a shell builtin command, which reads lines from standard
input, or another file. -d
is an option to change what the input line
delimiter is, we change it to NUL to match what find produces. $'\0'
is a bashism which is a shortcut to printf
. $'\0'
says to provide
a NUL byte as the delimiter for read.
Summary
Quote your variables.
Always. Then when you do occasionally need to do it un-quoted you'll think about it.
Use NUL delimited input when possible.
Most commands that can take a list of files as input from another file will have an option to allow NUL termination.
xargs has
-0
,tar -T <file>
has--null
,cpio
has both.If a command has its command-line structured such that an arbitrary number of files can be passed, use it with xargs.
If an argument needs further processing between input and passing to a command, it can be more readable to pipe it to a
while read
loop.Use arrays when possible.
You can use the command line arguments array of any shell with
set
$ set -- "foo bar" baz qux $ ./printargs "$@" foo bar baz qux
You can initialize a new array in bash with
$ array=("foo bar" baz qux) $ ./printargs "${array[@]}" foo bar baz qux
Debugging is the process of finding the reason a system misbehaves, making the appropriate correction, and verifying that this fixes the problem. In this article, I discuss how to find the problem. There may be later articles for other aspects of the bug fixing process.
There is not a lot of literature about debugging. One not-unreasonable book is Debugging rules, by David Agans. There is also little in the way of well-known systematic approaches to finding causes of problems. This article is based on my 29 years of programing experience of nearly daily debugging, various articles and books I've read over the years, and swapping war stories with other programmers. The overall structure follows that of the "Debugging rules" book, but using my words.
This article further concentrates on problems in a running program, when used by for real. Problems that happen during building or during test suites tend to be easier to find. What's more, test driven development helps with at that phase of a software's life cycle: there is usually only a small delta since the previous successful test run, so it's usually pretty obvious where the problem is.
Find a reliable way to reproduce the problem
Whatever the problem is, debugging is much easier if you can reliably, and preferably easily and quickly, reproduce the problem. If you can run the program in a given way, and it exhibits the problem immediately, you spend less time on waiting for the problem to happen again.
At the other extreme are problems that you don't know how to reproduce at all. This kind of problem can be almost impossible to fix.
There's no real recipe for reproducing bugs. It depends on the bug, and the circumstances in which it happens, and the program itself. Use your ingenuity here.
The kind of user who can make a program crash reliably, in the same way, every time, is extremely valuablue. Treasure them.
Binary searching for bugs
When you have your reliable reproduction recipe, it's time to find where in the problem the problem is. The basic technique for this is divide and conquer, which programmers know as binary search. You put a marker in the middle of a program's execution, and see whether the problem appears before or after it. Then you add more markers, dividing the execution into smaller and smaller parts, diving deeper and deeper into the program's logic. Obviously, it's possible to divide into more than two places at once, to find the location more easily.
Eventually you find the problem.
Unless you're unlucky.
Sometimes this division doesn't work. The problem might be, for example, that the problem only exhibits itself in the second half of the program, but is actually caused by something that happens in the first half. This can be hard to find, since everything in the first half looks to be in order.
Don't guess, don't assume, watch what's really happening
A common problem in debugging is for the programmer to act based on their mental model of what's happening. They think they know what's going on, and then they do things based on that. This is dangerous: the map is not the terrain, and the main reason for bugs is the difference between the programmer's mental model and reality.
When you're in debugging mode, you should ignore your mental model, and look at what the code actually does. This is difficult to do, but it is usually required to find the bug.
Debuggers versus logging/print statements
There are two kinds of programmers: those who use debuggers, those who use logging or print statements, those who can count, and those who use whatever tool is best for the situation.
A debugger is a tool that looks at a running program and lets you control the execution, and examine (and sometimes alter) the internal state of the program. In other words, it lets you run a program, stop it at any point, and look at the values of variables at the point. See our recent article on gdb for an example. The more advanced a debugger is, the more ways it provides for this basic task. For example, the stopping points (breakpoints) may be condition: the program will only stop if an expression using values from the program being debugged is true.
The other common approach is to put statements in the programming being executed to print out the values, either to the screen, or a log file of some sort.
Both approaches are valuable, and you should learn to use both. A debugger can be very efficient for zeroing on a problem, when you know how to reproduce it and need to find out what's happening. Log files are especially useful for long-running programs, and for analysing problems from production runs. Debuggers are the tool of choice when modifying the program is difficult or impossible. Log files are the right thing when running a program under a debugger isn't possible, or if the debugger would make the program run too slowly to be useable. Sometimes a combination works best.
Don't change the software more than you have to
When debugging a program, you should keep the changes you make down to as few as possible. Any change you make may affect the program in surprising ways. You should avoid the temptation of fixing even simple things, such as typos in comments, just in case. If nothing else, getting deep into a stack of changes will distract you from the task at hand (queue your yaks, don't stack them). Instead, make notes anything you want to fix so you can get back to them later.
If you make a change, and it doesn't help you get closer to the problem, undo it.
Keep an audit trail
The human short-term memory is about seven items large. That's not a lot. A debugging session may easily overflow that. You should keep an audit trail so you remember everything you've done, anything you've tried, and what the result was. Did you do a test run with the volume set to 11 or did you just think it would be a good thing to do?
An electronic journal is very useful for this. Copy-pasting code snippets, log files, screenshots, etc, is helpful. This becomes especially important if a debugging session takes days or weeks, since in that case it is guaranteed you will forget most of what you've done. (See the Obnam journal snippet for an example.)
Further, any changes to the code you make, you should commit into a version control system, using as tiny commits as you can. Make a new branch, where you can safely make any changes you want. Make several, if need be.
What is a coding style or standard
When you look at open source projects (and closed-source ones for that matter) you may often notice that the code within the project is all written in a particular way. This is a coding style. If it is written down for the project they may even refer to it as a coding standard. Some coding standards then get used in other projects than the one they were written for. No coding style is intrinsically better than any other (discounting styles explicitly designed to obfuscate the code or confuse the reader), but personal taste will affect the style you use. It's important if you contribute to an existing project that you try and stick to their coding style if you can.
What do they cover
Coding standards can cover many things from simply how to name files and where to put them all the way up to what parts of the language you're allowed to use and where. Common things covered by coding standards are things like how and where to capitalise words, how to indent control structures, how to document the codebase, what whitespace is required and where, and how long a line is allowed to be before it must be broken into parts. Obviously not every aspect of the style will be codified in any given standard since there's an infinity of possibilities, but the above will be quite commonly found.
Some examples
There are a number of popular coding standards in use in the open source world. For example, many applications written for GNOME or GTK+ (even unofficial ones) will follow the GNU style. Almost all Python software in existence (excluding Python and its standard library) at least pretends to follow the Python PEP8 coding standard. A large number of C based projects follow the Linux Kernel coding style. Some projects have less dry documents explaining how to write code in their style - for example, the NetSurf Style Guide is a PDF.
Benefits they convey
The primary goal of most coding styles is to provide a level of consistency in the codebase which can lead to easier identification of systematic problems in the code. Sometimes unusual requirements are there to prevent issues which have been encountered in the past and corrected only by systematic use of a style. Consistently styled code also reduces the chance of spurious formatting changes during otherwise unrelated commits, making revision control history cleaner and more easily followed. In some cases, coding styles such as the Cryptography Coding Standard were written to help prevent security holes in cryptographic code.
How can they handicap you
While coding styles are usually in place to improve readability, there are times when you need to break the coding style in order to make something legible to others. It is important that coders, and projects, recognise this as something which happens and don't use the coding style as a hammer and refuse patches where sticking to style would reduce readability or making the patch match the coding style can easily be done during merge.
As with all situations which are standardised, there are so many coding standards to choose from that they often cause impedance mismatches at library boundaries. As such, it's critical that coding styles (while programmatically checkable) are not programmatically enforced without some kind of escape mechanism.