Informational shell utilities

Primitives

echo(1) will print its command line out. Despite its simple behaviour, it is remarkably useful, since it's a way to get your shell to put its variables to its standard output.

$ VAR=foo
$ echo "$VAR"
foo

Given its frequent use and simplicity, the version of echo(1) you use will actually be the version built into your shell.

It can be used to write strings to a file, when used with output redirection.

$ echo foo >bar

cat(1) can be used to retrieve the contents of a file.

$ cat bar
foo

You can put the contents of a file into a variable using subshell expansion.

$ baz=`cat bar`
$ echo $baz
foo
$ qux=$(cat bar)
$ echo $qux
foo

So with echo(1), cat(1), output redirection and subshell expansion, you can move data between variables, command line arguments and files.

cat(1) has one last trick up its sleeve; it can be used to concatenate files, hence its name.

$ echo foo >1
$ echo bar >2
$ cat 1 2
foo
bar

File trimming

head(1) and tail(1), without further arguments, read the first 10 or last 10 lines of a file and write them to standard output.

$ seq 20 >numbers # write the numbers 1 to 20 to the file called numbers
$ head numbers
1
2
3
4
5
6
7
8
9
10
$ tail numbers
11
12
13
14
15
16
17
18
19
20

The number of lines can be tuned with the -n flag. So head -n3 is the first 3 lines, and tail -n3 is the last 3 lines.

$ head -n3 numbers
1
2
3
$ tail -n3 numbers
18
19
20

The number given to the -n option of head(1) can be prefixed with a - to print all but the last n lines, and tail(1) can be prefixed with a + to print all but the first n lines.

$ seq 5 >numbers
$ head -n -1
1
2
3
4
$ tail -n +1
2
3
4
5

The option -c can be used to perform a similar operation, but on the number of bytes rather than the number of lines.

tail(1) has a few more tricks. One of tail's key uses is to display the last lines of a log, which are generally more important to know what just happened to the program that generated it.

However, it's also useful to have the output from the log be printed to an open terminal, to see what it's doing, without having to reconfigure it to log differently, so tail(1) also has the -f flag, which stands for follow. When -f is specified, it will print the last 10 lines, but also print new lines as they are added.

Pagers

less(1) and more(1) are pagers. Pagers are responsible for allowing you to more easily read the output of a program that generates a lot, by keeping it all on one page and letting you say when you've finish reading it and continue to the next page.

less(1) is like more(1), but better. If you're interested, someone also wrote a pager called most(1).

Please note that cat "$file" | less is rarely the right thing to do. Unless $file is special in some way this is equivalent to less <"$file", and it's clearer, in almost all cases, to simply run less "$file".

Environment processing

Both env(1) and printenv(1) can be used to list all the variables. Indeed they behave the same if they are called with no arguments.

However, if you are writing a shell script and need to programmatically get a list of all the environment variables, and maybe their values, then you need the GNU coreutils versions of these programs, and then you can use use the -0 switch to get them to NUL terminate output, rather than newline terminate, allowing you to process them with read -d ''.

After this, env(1) and printenv(1) differ, since printenv(1) will list variables you list on the command line

$ export FOO=bar
$ export BAZ=qux
$ printenv FOO BAZ
bar
qux

However, this is of limited use, since your shell will suport variable interpolation natively.

$ echo $FOO $BAZ
bar qux

env(1) will set variables then run a command.

$ env BADGER=stoat printenv BADGER
stoat

env(1) can be used to run a program in a reduced environment with the -i flag.

$ env -i BADGER=stoat printenv
BADGER=stoat

Apart from all this, env(1) is used on the she-bang line of scripts, to allow looking up the interpreter in $PATH, since various programs may be installed in different places on different systems, but env(1) is usually installed at /usr/bin/env.

Directory contents information

ls

ls(1) can be used to list the contents of a directory.

$ ls /
bin    dev   initrd.img      lib64  opt   run   sys  var
boot   etc   initrd.img.old  media  proc  sbin  tmp  vmlinuz
cdrom  home  lib             mnt    root  srv   usr  vmlinuz.old

If it can determine that it is writing to a terminal, it will tabulate the results like above, otherwise it is one entry per line.

$ ls / | cat
bin
boot
cdrom
dev
etc
home
initrd.img
initrd.img.old
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
vmlinuz
vmlinuz.old

It's common to see shell scripts use ls(1) to perform an action on every entry in a directory.

$ for dir in `ls /`; do
    echo $dir
done

However, this would have problems if any of the directory names had spaces in them, and shell globbing is shorter, and handles spaces, just not any files or directories with names that start with ..

$ for dir in /*; do
    echo $dir
done

You can use ls -a to list files that start with ., though ls -A is generally more useful since it excludes . and ...

To do this safely in scripts is more complicated.

A portable way to do this is:

$ command='for arg; do echo "$arg"; done'
$ find / -mindepth 1 -maxdepth 1 -exec sh -c "$command" - {} +
/home
/etc
/media
/var
/bin
/boot
/dev
...

A less portable way that uses fewer subprocesses is:

$ find / -mindepth 1 -maxdepth 1 -print0 |
while read -d '' arg
do
    echo "$arg"
done
/home
/etc
/media
/var
/bin
/boot
/dev
/lib
/mnt
/opt
...

If your loop needs to change variables outside the body of the loop, you need to do this instead.

$ count=0
$ while read -d '' arg
do
    count=$(( "$count" + 1 ))
done < <(find / -mindepth 1 -maxdepth 1 -print0)
$ echo $count
24

pwd

pwd(1) will print the path to the current directory to standard output.

$ pwd
/home/richardmaw

This is not generally useful when using a shell interactively, since your shell's prompt will usually have the current directory in it:

$ PS1='\w\$ '
~$ cd /
/$ pwd
/

However, pwd can be of use in scripting if you need to change directory and wish to return to the current directory later in your shell script.

#!/bin/sh
HERE="$(pwd)"
# Do work including changing directory
# ...
cd "$HERE"
# Do more work back in the original directory
# ...

readlink

readlink(1) is a thin wrapper over the readlink(2) system call in its default mode of operation.

$ readlink /vmlinuz
boot/vmlinuz-3.11.0-17-generic

It is generally more common to discover this by running ls -l, since it's fewer characters to type, though readlink(1) is more useful in scripts.

$ ls -l /vmlinuz
lrwxrwxrwx 1 root root 30 Mar  3 08:07 /vmlinuz -> boot/vmlinuz-3.11.0-17-generic

Another useful mode of readlink(1) is readlink -f, which is "canonicalize" mode.

$ readlink -f /vmlinuz
/boot/vmlinuz-3.11.0-17-generic
$ cd ~
$ readlink -f ../../vmlinuz
/boot/vmlinuz-3.11.0-17-generic

Another use for readlink -f is to get an absolute path to a shell script, to access packaged resources.

$ install -m755 /dev/stdin /tmp/thisdir-test.sh <<'EOF'
#!/bin/sh
THISDIR="$(readlink -f "$(dirname "$0")")"
echo "$THISDIR"
EOF
$ /tmp/thisdir-test.sh
/tmp
$ mv /tmp/thisdir-test.sh ~
$ ~/thisdir-test.sh 
/home/richardmaw

stat

stat(1) uses the stat(2) system call to report information about files.

It defaults to showing a wide selection of information.

$ stat thisdir-test.sh 
  File: ‘thisdir-test.sh’
  Size: 69          Blocks: 8          IO Block: 4096   regular file
Device: 1ah/26d Inode: 237403      Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/richardmaw)   Gid: ( 1000/richardmaw)
Access: 2014-03-30 20:50:10.266400076 +0100
Modify: 2014-03-30 20:49:01.786402099 +0100
Change: 2014-03-30 20:49:42.522400896 +0100
 Birth: -

stat(1) can be told to output specific information with its -c argument, which takes a format string. %n is the file name, %s is the file size.

$ stat -c '%n: %s' thisdir-test.sh 
thisdir-test.sh: 69

User status

whoami(1) prints the user name of the current user.

$ whoami
richardmaw

who(1) and users(1) list who is logged in, in slightly different formats.

$ users
richardmaw richardmaw richardmaw richardmaw
$ who
richardmaw tty1         2014-03-30 15:49
richardmaw tty7         2014-03-30 18:06 (:0)
richardmaw pts/1        2014-03-30 18:13 (:0)
richardmaw pts/2        2014-03-30 20:10 (:0)

id(1) displays more information about the current user, including the primary group, supplementary groups, and all the numeric ids for them, all comma separated

$ id | while read -d ',' FOO; do echo $FOO; done                                                                                                   ~
uid=1000(richardmaw) gid=1000(richardmaw) groups=1000(richardmaw)
4(adm)
20(dialout)
24(cdrom)
25(floppy)
27(sudo)
29(audio)
30(dip)
44(video)
46(plugdev)
104(scanner)
113(netdev)
114(bluetooth)
120(fuse)

id(1) can be given various options to limit its output, such as -Z to also print the SELinux security context.

date

date(1) prints the current date and time to standard output.

$ date
Sun Mar 30 21:36:16 BST 2014

It also takes a format string.

$ date +%F
2014-03-30

It can be told to format a date which isn't now, with the -d option.

$ date -d yesterday
Sat Mar 29 20:37:58 GMT 2014

wc

wc(1) prints line, word and character counts. By default it prints all 3 and the file name.

$ wc thisdir-test.sh 
 3  7 69 thisdir-test.sh

To print only one of those, specify -l, -w and -c respectively.

$ wc -l thisdir-test.sh 
3 thisdir-test.sh

To prevent wc(1) printing the file name, the file has to be provided as the standard input.

$ wc -l <thisdir-test.sh 
3

A common solution to finding out how many files in a directory is ls $dir | wc -l, though this will give incorrect results if the file names have new line characters in them.

The most concise, correct way to do this is:

$ find / -maxdepth 1 -mindepth 1 -print0 | grep -cz .
24