Whitespace is the set of blank characters, commonly defined as space, tab, newline and possibly carriage return.
Its significance in shell scripts is that command line arguments are separated by whitespace, unless the arguments are quoted.
For illustration, we have a shell script called printargs
, which writes
each of the arguments we give it on a new line.
$ install /dev/stdin printargs <<'EOF'
#!/bin/sh
for arg; do
echo "$arg"
done
EOF
$ ./printargs foo bar baz
foo
bar
baz
You can see that it behaves as expected for the case of simple, one word strings for the arguments.
However, we want to print the string hello world
. If we were to write
it in as normal, it would print hello and world on different lines.
$ ./printargs hello world
hello
world
This is where quoting comes in, if you surround a string with quotation
marks, i.e. '
or "
, then it is treated as a single argument.
$ ./printargs "hello world"
hello world
$ ./printargs 'hello world'
hello world
Alternatively, special characters can be escaped with a \
(backslash).
$ ./printargs hello\ world
hello world
However, this looks ugly.
Similarly if you wanted to put a "
in a string that was quoted with
double quotes you could escape it them, or use the other quoting style.
$ ./printargs "hello \"material\" world"
hello "material" world
$ ./printargs 'hello "material" world'
hello "material" world
$ ./printargs "hello 'material' world"
hello 'material' world
The equivalent for '
is very ugly, since the only thing that terminates
a singly quoted sequence is a single quote, escaping is not permitted.
$ ./printargs 'hello '\''material'\'' world'
hello 'material' world
Having read that, you may wonder how people make whitespace errors in shell commands, but it becomes less obvious when variables are involved.
$ var="hello \"material\" world"
$ ./printargs $var
hello
"material"
world
This goes wrong because $var
is expanded in the command line, and
looks like ./printargs hello \"material\" world
to your shell.
This can be prevented by quoting the variable substitution.
$ ./printargs "$var"
hello "material" world
You may wonder why the shell behaves this way. It's mostly historical, since that's how shells have always done it, and it's kept that way for backwards compatibility, though some, like zsh, break with backwards compatibility in favour of a more sensible default.
It does occasionally come in useful, when strings aren't whitespace sensitive.
$ names="what who why"
$ for name in $names; do
echo my name is $name
done; \
echo Slim shady
my name is what
my name is who
my name is why
Slim shady
However, if you're dealing with filenames, this is entirely inappropriate.
$ mkdir temp
$ cd temp
$ touch "foo bar" baz
$ for fn in `find . -type f`; do rm "$fn"; done
rm: cannot remove `./foo': No such file or directory
rm: cannot remove `bar': No such file or directory
$ ls
foo bar
Admittedly, this example is a little contrived, but it can be the difference between cleaning up your temporary files and deleting your music collection.
$ ls ~
temp music.mp3
$ ls -1 ~/temp
not music.mp3
scrap
$ cd ~
$ for fn in `find ~/temp -type f`; do rm "$f"; done
rm: cannot remove `~/temp/not': No such file or directory
$ ls
temp
There are a few ways this could have been avoided.
- Using arrays
- Process the files directly with find
- Have find pass which files to process on to another program which handles whitespace better
- Handle whitespace yourself
Using arrays
I mentioned $*
and $@
when I first talked about shell. These are
used for expanding variables, either the command line arguments directly,
or other array variables as ${array[@]}
.
They behave identically unless quoted with "
, in which case $@
splits
each argument into a different word, while $*
becomes a single string,
with each argument separated by a space.
Behold!
$ set -- "foo bar" baz qux
$ ./printargs $@
foo
bar
baz
qux
$ ./printargs "$@"
foo bar
baz
qux
$ ./printargs "$*"
foo bar baz qux
The previous example used set --
to use the command line argument
array, since every shell has one. If you are using bash, you can have
other arrays.
$ array=( "foo bar" baz qux )
$ ./printargs "${array[@]}"
foo bar
baz
qux
$ array+=(badger)
$ ./printargs "${array[@]}"
foo bar
baz
qux
badger
Glob expressions work in arrays too, so you can have an array of all files in a directory.
$ toremove=( ~/temp/* )
$ rm "${toremove[@]}"
Unfortunately, there is not a built-in way of recursively reading the contents of a directory into an array, so one of the later techniques will be required in conjunction.
$ declare -a toremove
$ while read f; do toremove+=("$f"); done < <(find ~/temp -type f)
$ rm "${toremove[@]}"
Removing directly with find
Find can remove the files itself if you use the -delete
option. It
would be used like this:
$ find ~/temp -type f -delete
Though this is specific to deleting the file. We may instead want to do
something else, like ensure it can't be executed. This could be done with
chmod a-x
and find's -exec
option.
$ find ~/temp -type f -exec chmod a-x {} +
Removing a file is similarly achieved, just by using rm
instead of
chmod
.
$ find ~/temp -type f -exec rm {} +
More complicated operations are possible with an inline shell command. The following makes a backup of all the files as their name with .bak suffixed.
$ find ~/temp -type f -exec sh -c 'cp "$1" "$1.bak"' - {} \;
This is pretty ugly for anything complicated, difficult to remember, and requires you to remember how to invoke a command in a subshell the hard way.
For more details on how this works, refer to the find(1) man page,
looking for the alternative form of -exec
, and the bash(1) man page
for what -c
means and how to pass extra arguments to a shell command.
Passing commands to xargs
If you have a deficient find command, or want to minimise the number of
commands run, you can use xargs
.
This can be used similarly to the previous find commands:
$ find ~/temp -type f | xargs rm
For people with a wide knowledge of shell programming (i.e. knows lots of commands and how to put them together, rather than all the options of each command) this is more readable, however it is not yet whitespace safe, since find separates its arguments with a newline character, so if you had a filename with a newline in it, it would get it wrong.
This can be solved by find's -print0
argument and xargs' -0
argument.
$ find ~/temp -type f -print0 | xargs -0 rm
What this will do is instead of separating file paths with newline characters, it will separate them with NUL bytes (i.e. all bits 0). Since NUL is not a valid character in filenames, it cannot misinterpret the strings.
This is if anything, slightly more ugly than using -exec
with find
for renaming files.
$ find ~/temp -type f -print0 | xargs -0 -n 1 sh -c 'cp "$1" "$1.bak"' -
Handling whitespace yourself
xargs
is not the only command that can be used to handle input. If
you're using bash, you can do it in shell directly with read -d
.
$ find ~/temp -type f -print0 | while read -d $'\0' fn; do
cp "$fn" "$fn.bak"
done
read
is a shell builtin command, which reads lines from standard
input, or another file. -d
is an option to change what the input line
delimiter is, we change it to NUL to match what find produces. $'\0'
is a bashism which is a shortcut to printf
. $'\0'
says to provide
a NUL byte as the delimiter for read.
Summary
Quote your variables.
Always. Then when you do occasionally need to do it un-quoted you'll think about it.
Use NUL delimited input when possible.
Most commands that can take a list of files as input from another file will have an option to allow NUL termination.
xargs has
-0
,tar -T <file>
has--null
,cpio
has both.If a command has its command-line structured such that an arbitrary number of files can be passed, use it with xargs.
If an argument needs further processing between input and passing to a command, it can be more readable to pipe it to a
while read
loop.Use arrays when possible.
You can use the command line arguments array of any shell with
set
$ set -- "foo bar" baz qux $ ./printargs "$@" foo bar baz qux
You can initialize a new array in bash with
$ array=("foo bar" baz qux) $ ./printargs "${array[@]}" foo bar baz qux