cut

cut(1) is used to extract data from delimiter separated fields. This can be useful for processing tabulated data.

$ cat >testdata <<EOF
1,2,3
4,5,6
7,8,9
EOF
$ cut -d, -f3 testdata
3
6
9

The -d option specifies which delimiter to use, defaulting to tab. The -f option specifies which fields to include. This can be a list of fields or ranges.

$ cat >testdata <<EOF
1:2:3:4
5:6:7:8
9:0:a:b
c:d:e:f
EOF
$ cut -d: -f1,3-4 testdata
1:3:4
5:7:8
9:a:b
c:e:f

When combined with head and tail, it is possible to extract data by row too.

$ head -n 3 testdata | tail -n 2 | cut -d: -f2-3
6:7
0:a

Spaces can be used for delimiters too, but cut(1) only supports formats that have one delimiter per field.

$ cat >testdata <<EOF
a  b c
EOF
$ cut -d ' ' -f1-2 <testdata #expects a b
a 

If you want different behaviour, a different tool is required, such as shell's read built-in, or awk(1).

$ read a b c <testdata
$ echo "$a $b"
a b

$ awk <testdata '{ print $1 $2 }'
ab

It is not possible to define a character sequence as a delimiter.

$ cat >testdata <<EOF
a->b->c
EOF
$ cut -d -> -f 2
cut: the delimiter must be a single character
Try `cut --help' for more information.

For this more complicated tools need to be used.

$ awk <testdata '{ split($0, A, /->/); print A[2] }'
b

paste

paste(1) joins two delimiter separated files together. This can be used to move columns of a file around.

$ cat >testdata1 <<EOF
1,2
3,4
EOF
$ cat >testdata2 <<EOF
a,b
c,d
EOF
$ paste -d, testdata1 testdata2
1,2,a,b
3,4,c,d

When combined with cut(1), fields can be rearranged.

$ cat >testdata <<EOF
1:a:e:5
2:b:f:6
3:c:g:7
4:d:h:8
EOF
$ paste -d: <(cut -d: -f1,4 testdata) <(cut -d: -f2,3 testdata)
1:5:a:e
2:6:b:f
3:7:c:g
4:8:d:h

join

join(1) merges two files that share a field. Lines with the same value for that field are combined like paste(1).

$ cat >names <<EOF
1:Richard
2:Jonathan
3:Zwingbor the terrible of planet Flarg
EOF
$ cat >colours <<EOF
1:Red
2:Blue
3:Putrescent Green
EOF
$ join -t: -j1 names colours
1:Richard:Red
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green

split

split(1) splits a file into multiple smaller files. This could be useful for splitting up text based data when files become too long.

$ seq 1000 >lines
split -e -n2 lines split-lines-
$ wc -l split-lines-aa
513 split-lines-aa
$ wc -l split-lines-ab
487 split-lines-ab

fold and fmt

fold(1) is a primitive tool to wrap lines. The width to wrap at can be specified with -w.

$ cat >text <<EOF
hello world, I am a long line, that needs to be shortened
EOF
$ fold -w 40 text
hello world, I am a long line, that need
s to be shortened

When to break a line can be tweaked, -s will break lines at spaces, rather than in the middle of words.

$ fold -s -w 40 text
hello world, I am a long line, that 
needs to be shortened

fmt(1) is a more advanced tool for wrapping lines. As well as splitting lines at whitespace when they become too long, it will re-flow paragraphs when the lines are too short.

$ cat >text <<EOF
Hello world.
I am text.
I need to be a paragraph.
EOF
$ fmt text
Hello world.  I am text.  I need to be a paragraph.

nl

nl(1) puts line numbers before each line in a file.

$ cat >text <<EOF
Hello
World
EOF
$ nl text
     1  Hello
     2  World

sort and uniq

sort(1) can be used to re-order the lines of a file based on various criteria, defaulting to ASCIIbetical.

$ cat >data <<EOF
2:Jonathan:Blue
1:Richard:Red
3:Zwingbor the terrible of planet Flarg:Putrescent Green
EOF
$ sort data
1:Richard:Red
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green

sort(1) can sort by field.

$ sort -t: -k3 data
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green
1:Richard:Red

The sort order can be reversed.

$ sort -r data
3:Zwingbor the terrible of planet Flarg:Putrescent Green
2:Jonathan:Blue
1:Richard:Red

uniq(1) removes duplicate lines. Its algorithm expects the data to be sorted, since it removes consequtive, idential lines.

$ cat >data <<EOF
1
1
1
2
3
EOF
$ uniq data
1
2
3

Since data is rarely sorted, this usually means that the command you need to run is sort data | uniq.

cat >data <<EOF
1
2
1
5
6
1
2
5
EOF
$ sort data | uniq
1
2
5
6

However, since this is such a common operation, and executing a separate subprocess would be wasteful, GNU's sort(1) accepts a -u paramter which does this.

$ sort -u data
1
2
5
6

comm

comm(1) will tell you which lines are _comm_on between two sorted files.

$ cat >file1 <<EOF
1
2
3
EOF
$ cat >file2 <<EOF
2
3
4
EOF
$ comm file1 file2
1
        2
        3
    4

The first column is lines unique to the first file, the second column is lines unique to the second, and the third is lines common to both.

The options of comm(1) are a little odd. You can pass -1, -2, or -3 to remove that field. This is a bit of an odd decision, given the common operation is to use the flags to get only one column you were interested in. So you would pass -12 to only get lines that were common to both files.

$ comm file1 file2
2
3
Posted Wed Jun 4 11:00:11 2014 Tags:

One of the most useful features of shells is the ability to write scripts. It should not come as a surprise to find that there are many commands that only make sense when used with scripts.

[ and test

test(1) and [ are synonymous, apart from the fact that the last argument to [ must be ].

They are programs for evaluating various expressions, usually used in shell flow control statements.

var=foo
if [ "$var" = foo ]; then
    echo bar
fi

It is important to realise that [ is not a syntactical construct, any command may be used in a shell conditional expression.

true and false

Processes, as well as producing output, terminate with an "exit status". Sometimes the exit status is more important than the output. Occasionally the exit status is all you want.

Since the only two exit status codes that people generally care about are zero (ok) and non-zero (error), the true(1) and false(1) commands were written.

One major use for this is flow control in shell scripts.

if complicated and expensive command; then
    ok=true
else
    ok=false
fi
if "$ok"; then
    echo I am ok
fi

Without true(1) or false(1) you would need to come up with your own convention for booleans. You would then need to use test(1) to check them, something like this:

if complicated and expensive command; then
    ok=1
else
    ok=0
fi
if [ "$ok" = 1 ]; then
    echo I am ok
fi

Another is setting a user's shell to /bin/false so they cannot log in.

$ grep false /etc/passwd
messagebus:x:102:104::/var/run/dbus:/bin/false

A third is to replace a command that is no longer required, but other scripts expect to be able to run it.

For example, on old Linux systems, pre-devtmpfs and udev, there was a script called /dev/MAKEDEV. For compatibility, some systems keep the script as a symlink to /bin/true.

A final use, is when you need to have a command in that context.

For example, bash has variable substitution options that will set a variable to a default value. They need to be run as part of another command, or they will be interpreted as a command.

$ true ${var=default}

To make this less confusing, bash introduced : as a synonim for true(1).

$ : ${var=default}

basename and dirname

basename(1) and dirname(1) are used for manipulating file paths.

$ var=/path/to/foo/bar-file.txt
$ #   <--dirname--><-basename->
$ #                 suffix-^--^
$ dirname "$var"
/path/to/foo/
$ basename "$var"
bar-file.txt
$ basename "$var" .txt
bar-file

They are mostly wrappers around the C library's basename(3) and dirname(3) functions, with the exception of basename(1) also taking a suffix parameter, which strips the suffix off the end of the file path if provided.

seq and yes

seq(1) and yes(1) both produce lines of output based on their parameters.

seq(1) produces lines of integers in the provided range.

$ seq 3
1
2
3
$ seq 3 5
3
4
5
$ seq 10 10 30
10
20
30
$ seq 100 -20 60
100
80
60

This can be useful for loop conditions.

$ for i in `seq 1 3`; do echo "hello $i"; done
hello 1
hello 2
hello 3

Though in this case, bash brace expansion is more effective.

$ for i in {1..3}; do echo "hello $i"; done
hello 1
hello 2
hello 3

yes(1) prints the same line endlessly. You could naievely implement it as yes(){ echo "${1-y}"; };

This may not sound useful for anything more than heating the area immediately around your computer, but there are often commands that ask for confirmation over standard input.

This may be an installer script, where the defaults are fine, or you may have a version of cp(1) that doesn't have a --no-clobber or --force option, but does have an --interactive option, which can be simulated with yes(1) as yes n | cp -i "$source" "$dest" and yes | cp -i "$source" "$dest" respectively.

timeout and nohup

timeout(1) and nohup(1) alter a command's termination conditions.

timeout(1) can be used to end a process after a given period of time. For example, you can work out how many lines yes(1) can print in 1 second with:

$ timeout 1s yes | wc -l
8710114

nohup(1) will stop a process being killed when the terminal it runs from is closed. This is not demonstrable in the medium I have chosen to demonstrate commands with, but it is run as nohup COMMAND >output.

If the output of this command is not redirected away from the terminal, then it is written to nohup.out.

This is useful if you have a long-running job that you do not want to end when you disconnect from the server you are running it on.

However, screen(1) is a better choice for this task, as it gives you a terminal you can detach from by pressing ^A-D and reattach later with the screen -x command.

sleep

sleep(1) exits after the specified period. You can use it to implement timeout(1) as:

timeout(){
    time="$1"
    shift
    "$@"& 
    pid="$!"
    sleep "$time"
    kill "$pid"
}

It is also useful in a retry loop to avoid busy-waiting.

while ! ssh some-server; do
    sleep 10
done

tr

tr(1) is used to replace or delete characters in a stream.

For example, this will make all characters upper-case.

$ echo hello world | tr '[:lower:]' '[:upper:]'
HELLO WORLD

The -d flag can be used to remove characters from a stream. This will remove all vowels:

$ echo hello world | tr -d 'aeiou'
hll wrld

printf

printf(1) is mostly a thin wrapper over printf(3). Look to that for most of the details.

One addition worth mentioning is that bash has a built-in version, which also supports %q similarly to %s, but quoted so you can use it to safely generate scripts.

$ printf 'echo %q %q' "foo bar" "baz"
echo foo\ bar baz

tee

tee(1) writes its standard input to its standard output and any files listed on the command line.

$ echo foo bar | tee out
foo bar
$ cat out
foo bar

It is also usable as a way to write the output of a command somewhere you don't usually have permission to write, in conjunction with sudo(1).

$ echo foo bar | sudo tee out >/dev/null
Posted Wed Jun 11 11:00:07 2014 Tags:
Lars Wirzenius Be a mensch

When participating in free software development, sometimes people forget they're talking with living, feeling beings, and behave quite badly. Don't do that. Never forget there's a person behind a nickname or an e-mail address. Don't be a jerk. Be a mensch.

See also:

Posted Wed Jun 18 11:00:06 2014 Tags:
Daniel Silverstone My Browser History

I've wanted to share a random selection of my browser history with you, the readers of Yakking, for quite some time now. I have been sat at my computer for nearly an hour without coming up with something technical to write, so I shall do just that. Remember people, sometimes you don't need to do something technical to become a better hacker.

I've recently been spending a lot of time reading about actigraphy because I've been working on getting my new Pebble smartwatch to have the software needed to replace my Jawbone Up. This has left me reading a number of papers such as Ryan Libby's paper A simple method for reliable footstep detection on embedded sensor platforms and also articles about sleep (particularly non-rem sleep) and how other methods such as measuring respiratory effort can help characterise sleep.

It seems that before then I spent a lot of time reading things about Snowden and the secret trials in the UK; mixed in with spending a while on Charles Stross' wonderful website (particularly his article about software to detect sarcasm in tweets). I was also spending time reading about UX design such as onboarding teardowns and how sometimes when programmers want to use a new tool because they encountered a problem, the problem is not the tool itself. Finally in that bunch I had a jolly good boggle at how someone copyrighted pi proving that idiocy exists everywhere there's an opportunity to make money.

Other than that, my time working on things with the NetSurf Browser Project left me reading about CSS cascade rules (again), Internationalised Domain Names (including the spec) and also hitting the (English) Wikipedia Random Page link a good few times :-)

I think I'll leave this here and hope that I've not scared you all away. I hope to see you soon on Yakking and would encourage you to share a few interesting links from your recent browser history below in the comments for others to marvel over.

(P.S. some scary people such as Tony Finch have linklogs which are worth a flick through. I'd love to know if you all would like us to start a Yakking linklog)


Posted Wed Jun 25 11:00:08 2014