Richard Maw

cut

cut(1) is used to extract data from delimiter separated fields. This can be useful for processing tabulated data.

$ cat >testdata <<EOF
1,2,3
4,5,6
7,8,9
EOF
$ cut -d, -f3 testdata
3
6
9

The -d option specifies which delimiter to use, defaulting to tab. The -f option specifies which fields to include. This can be a list of fields or ranges.

$ cat >testdata <<EOF
1:2:3:4
5:6:7:8
9:0:a:b
c:d:e:f
EOF
$ cut -d: -f1,3-4 testdata
1:3:4
5:7:8
9:a:b
c:e:f

When combined with head and tail, it is possible to extract data by row too.

$ head -n 3 testdata | tail -n 2 | cut -d: -f2-3
6:7
0:a

Spaces can be used for delimiters too, but cut(1) only supports formats that have one delimiter per field.

$ cat >testdata <<EOF
a  b c
EOF
$ cut -d ' ' -f1-2 <testdata #expects a b
a

If you want different behaviour, a different tool is required, such as shell's read built-in, or awk(1).

$ read a b c <testdata
$ echo "$a $b"
a b

$ awk <testdata '{ print $1 $2 }'
ab

It is not possible to define a character sequence as a delimiter.

$ cat >testdata <<EOF
a->b->c
EOF
$ cut -d -> -f 2
cut: the delimiter must be a single character
Try `cut --help' for more information.

For this more complicated tools need to be used.

$ awk <testdata '{ split($0, A, /->/); print A[2] }'
b

paste

paste(1) joins two delimiter separated files together. This can be used to move columns of a file around.

$ cat >testdata1 <<EOF
1,2
3,4
EOF
$ cat >testdata2 <<EOF
a,b
c,d
EOF
$ paste -d, testdata1 testdata2
1,2,a,b
3,4,c,d

When combined with cut(1), fields can be rearranged.

$ cat >testdata <<EOF
1:a:e:5
2:b:f:6
3:c:g:7
4:d:h:8
EOF
$ paste -d: <(cut -d: -f1,4 testdata) <(cut -d: -f2,3 testdata)
1:5:a:e
2:6:b:f
3:7:c:g
4:8:d:h

join

join(1) merges two files that share a field. Lines with the same value for that field are combined like paste(1).

$ cat >names <<EOF
1:Richard
2:Jonathan
3:Zwingbor the terrible of planet Flarg
EOF
$ cat >colours <<EOF
1:Red
2:Blue
3:Putrescent Green
EOF
$ join -t: -j1 names colours
1:Richard:Red
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green

split

split(1) splits a file into multiple smaller files. This could be useful for splitting up text based data when files become too long.

$ seq 1000 >lines
split -e -n2 lines split-lines-
$ wc -l split-lines-aa
513 split-lines-aa
$ wc -l split-lines-ab
487 split-lines-ab

fold and fmt

fold(1) is a primitive tool to wrap lines. The width to wrap at can be specified with -w.

$ cat >text <<EOF
hello world, I am a long line, that needs to be shortened
EOF
$ fold -w 40 text
hello world, I am a long line, that need
s to be shortened

When to break a line can be tweaked, -s will break lines at spaces, rather than in the middle of words.

$ fold -s -w 40 text
hello world, I am a long line, that 
needs to be shortened

fmt(1) is a more advanced tool for wrapping lines. As well as splitting lines at whitespace when they become too long, it will re-flow paragraphs when the lines are too short.

$ cat >text <<EOF
Hello world.
I am text.
I need to be a paragraph.
EOF
$ fmt text
Hello world.  I am text.  I need to be a paragraph.

nl

nl(1) puts line numbers before each line in a file.

$ cat >text <<EOF
Hello
World
EOF
$ nl text
     1  Hello
     2  World

sort and uniq

sort(1) can be used to re-order the lines of a file based on various criteria, defaulting to ASCIIbetical.

$ cat >data <<EOF
2:Jonathan:Blue
1:Richard:Red
3:Zwingbor the terrible of planet Flarg:Putrescent Green
EOF
$ sort data
1:Richard:Red
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green

sort(1) can sort by field.

$ sort -t: -k3 data
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green
1:Richard:Red

The sort order can be reversed.

$ sort -r data
3:Zwingbor the terrible of planet Flarg:Putrescent Green
2:Jonathan:Blue
1:Richard:Red

uniq(1) removes duplicate lines. Its algorithm expects the data to be sorted, since it removes consequtive, idential lines.

$ cat >data <<EOF
1
1
1
2
3
EOF
$ uniq data
1
2
3

Since data is rarely sorted, this usually means that the command you need to run is sort data | uniq.

cat >data <<EOF
1
2
1
5
6
1
2
5
EOF
$ sort data | uniq
1
2
5
6

However, since this is such a common operation, and executing a separate subprocess would be wasteful, GNU's sort(1) accepts a -u paramter which does this.

$ sort -u data
1
2
5
6

comm

comm(1) will tell you which lines are _comm_on between two sorted files.

$ cat >file1 <<EOF
1
2
3
EOF
$ cat >file2 <<EOF
2
3
4
EOF
$ comm file1 file2
1
        2
        3
    4

The first column is lines unique to the first file, the second column is lines unique to the second, and the third is lines common to both.

The options of comm(1) are a little odd. You can pass -1, -2, or -3 to remove that field. This is a bit of an odd decision, given the common operation is to use the flags to get only one column you were interested in. So you would pass -12 to only get lines that were common to both files.

$ comm file1 file2
2
3

Richard Maw

One of the most useful features of shells is the ability to write scripts. It should not come as a surprise to find that there are many commands that only make sense when used with scripts.

`[` and `test`

test(1) and [ are synonymous, apart from the fact that the last argument to [ must be ].

They are programs for evaluating various expressions, usually used in shell flow control statements.

var=foo
if [ "$var" = foo ]; then
    echo bar
fi

It is important to realise that [ is not a syntactical construct, any command may be used in a shell conditional expression.

`true` and `false`

Processes, as well as producing output, terminate with an "exit status". Sometimes the exit status is more important than the output. Occasionally the exit status is all you want.

Since the only two exit status codes that people generally care about are zero (ok) and non-zero (error), the true(1) and false(1) commands were written.

One major use for this is flow control in shell scripts.

if complicated and expensive command; then
    ok=true
else
    ok=false
fi
if "$ok"; then
    echo I am ok
fi

Without true(1) or false(1) you would need to come up with your own convention for booleans. You would then need to use test(1) to check them, something like this:

if complicated and expensive command; then
    ok=1
else
    ok=0
fi
if [ "$ok" = 1 ]; then
    echo I am ok
fi

Another is setting a user's shell to /bin/false so they cannot log in.

$ grep false /etc/passwd
messagebus:x:102:104::/var/run/dbus:/bin/false

A third is to replace a command that is no longer required, but other scripts expect to be able to run it.

For example, on old Linux systems, pre-devtmpfs and udev, there was a script called /dev/MAKEDEV. For compatibility, some systems keep the script as a symlink to /bin/true.

A final use, is when you need to have a command in that context.

For example, bash has variable substitution options that will set a variable to a default value. They need to be run as part of another command, or they will be interpreted as a command.

$ true ${var=default}

To make this less confusing, bash introduced : as a synonim for true(1).

$ : ${var=default}

`basename` and `dirname`

basename(1) and dirname(1) are used for manipulating file paths.

$ var=/path/to/foo/bar-file.txt
$ #   <--dirname--><-basename->
$ #                 suffix-^--^
$ dirname "$var"
/path/to/foo/
$ basename "$var"
bar-file.txt
$ basename "$var" .txt
bar-file

They are mostly wrappers around the C library's basename(3) and dirname(3) functions, with the exception of basename(1) also taking a suffix parameter, which strips the suffix off the end of the file path if provided.

`seq` and `yes`

seq(1) and yes(1) both produce lines of output based on their parameters.

seq(1) produces lines of integers in the provided range.

$ seq 3
1
2
3
$ seq 3 5
3
4
5
$ seq 10 10 30
10
20
30
$ seq 100 -20 60
100
80
60

This can be useful for loop conditions.

$ for i in `seq 1 3`; do echo "hello $i"; done
hello 1
hello 2
hello 3

Though in this case, bash brace expansion is more effective.

$ for i in {1..3}; do echo "hello $i"; done
hello 1
hello 2
hello 3

yes(1) prints the same line endlessly. You could naievely implement it as yes(){ echo "${1-y}"; };

This may not sound useful for anything more than heating the area immediately around your computer, but there are often commands that ask for confirmation over standard input.

This may be an installer script, where the defaults are fine, or you may have a version of cp(1) that doesn't have a --no-clobber or --force option, but does have an --interactive option, which can be simulated with yes(1) as yes n | cp -i "$source" "$dest" and yes | cp -i "$source" "$dest" respectively.

`timeout` and `nohup`

timeout(1) and nohup(1) alter a command's termination conditions.

timeout(1) can be used to end a process after a given period of time. For example, you can work out how many lines yes(1) can print in 1 second with:

$ timeout 1s yes | wc -l
8710114

nohup(1) will stop a process being killed when the terminal it runs from is closed. This is not demonstrable in the medium I have chosen to demonstrate commands with, but it is run as nohup COMMAND >output.

If the output of this command is not redirected away from the terminal, then it is written to nohup.out.

This is useful if you have a long-running job that you do not want to end when you disconnect from the server you are running it on.

However, screen(1) is a better choice for this task, as it gives you a terminal you can detach from by pressing ^A-D and reattach later with the screen -x command.

`sleep`

sleep(1) exits after the specified period. You can use it to implement timeout(1) as:

timeout(){
    time="$1"
    shift
    "$@"& 
    pid="$!"
    sleep "$time"
    kill "$pid"
}

It is also useful in a retry loop to avoid busy-waiting.

while ! ssh some-server; do
    sleep 10
done

`tr`

tr(1) is used to replace or delete characters in a stream.

For example, this will make all characters upper-case.

$ echo hello world | tr '[:lower:]' '[:upper:]'
HELLO WORLD

The -d flag can be used to remove characters from a stream. This will remove all vowels:

$ echo hello world | tr -d 'aeiou'
hll wrld

`printf`

printf(1) is mostly a thin wrapper over printf(3). Look to that for most of the details.

One addition worth mentioning is that bash has a built-in version, which also supports %q similarly to %s, but quoted so you can use it to safely generate scripts.

$ printf 'echo %q %q' "foo bar" "baz"
echo foo\ bar baz

`tee`

tee(1) writes its standard input to its standard output and any files listed on the command line.

$ echo foo bar | tee out
foo bar
$ cat out
foo bar

It is also usable as a way to write the output of a command somewhere you don't usually have permission to write, in conjunction with sudo(1).

$ echo foo bar | sudo tee out >/dev/null

Lars Wirzenius

When participating in free software development, sometimes people forget they're talking with living, feeling beings, and behave quite badly. Don't do that. Never forget there's a person behind a nickname or an e-mail address. Don't be a jerk. Be a mensch.

←	Jun 2014					→
S	M	T	W	T	F	S
1	2	3	4 Shell commands for data manipulation	5	6	7
8	9	10	11 Shell scripting commands	12	13	14
15	16	17	18 Be a mensch	19	20	21
22	23	24	25 My Browser History	26	27	28
29	30

cut

paste

join

split

fold and fmt

nl

sort and uniq

comm

[ and test

true and false

basename and dirname

seq and yes

timeout and nohup

sleep

tr

printf

tee