cut
cut(1) is used to extract data from delimiter separated fields. This can be useful for processing tabulated data.
$ cat >testdata <<EOF
1,2,3
4,5,6
7,8,9
EOF
$ cut -d, -f3 testdata
3
6
9
The -d
option specifies which delimiter to use, defaulting to tab. The
-f
option specifies which fields to include. This can be a list of
fields or ranges.
$ cat >testdata <<EOF
1:2:3:4
5:6:7:8
9:0:a:b
c:d:e:f
EOF
$ cut -d: -f1,3-4 testdata
1:3:4
5:7:8
9:a:b
c:e:f
When combined with head
and tail
, it is possible to extract data by
row too.
$ head -n 3 testdata | tail -n 2 | cut -d: -f2-3
6:7
0:a
Spaces can be used for delimiters too, but cut(1) only supports formats that have one delimiter per field.
$ cat >testdata <<EOF
a b c
EOF
$ cut -d ' ' -f1-2 <testdata #expects a b
a
If you want different behaviour, a different tool is required, such as
shell's read
built-in, or awk(1).
$ read a b c <testdata
$ echo "$a $b"
a b
$ awk <testdata '{ print $1 $2 }'
ab
It is not possible to define a character sequence as a delimiter.
$ cat >testdata <<EOF
a->b->c
EOF
$ cut -d -> -f 2
cut: the delimiter must be a single character
Try `cut --help' for more information.
For this more complicated tools need to be used.
$ awk <testdata '{ split($0, A, /->/); print A[2] }'
b
paste
paste(1) joins two delimiter separated files together. This can be used to move columns of a file around.
$ cat >testdata1 <<EOF
1,2
3,4
EOF
$ cat >testdata2 <<EOF
a,b
c,d
EOF
$ paste -d, testdata1 testdata2
1,2,a,b
3,4,c,d
When combined with cut(1), fields can be rearranged.
$ cat >testdata <<EOF
1:a:e:5
2:b:f:6
3:c:g:7
4:d:h:8
EOF
$ paste -d: <(cut -d: -f1,4 testdata) <(cut -d: -f2,3 testdata)
1:5:a:e
2:6:b:f
3:7:c:g
4:8:d:h
join
join(1) merges two files that share a field. Lines with the same value for that field are combined like paste(1).
$ cat >names <<EOF
1:Richard
2:Jonathan
3:Zwingbor the terrible of planet Flarg
EOF
$ cat >colours <<EOF
1:Red
2:Blue
3:Putrescent Green
EOF
$ join -t: -j1 names colours
1:Richard:Red
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green
split
split(1) splits a file into multiple smaller files. This could be useful for splitting up text based data when files become too long.
$ seq 1000 >lines
split -e -n2 lines split-lines-
$ wc -l split-lines-aa
513 split-lines-aa
$ wc -l split-lines-ab
487 split-lines-ab
fold and fmt
fold(1) is a primitive tool to wrap lines. The width to wrap at can
be specified with -w
.
$ cat >text <<EOF
hello world, I am a long line, that needs to be shortened
EOF
$ fold -w 40 text
hello world, I am a long line, that need
s to be shortened
When to break a line can be tweaked, -s
will break lines at spaces,
rather than in the middle of words.
$ fold -s -w 40 text
hello world, I am a long line, that
needs to be shortened
fmt(1) is a more advanced tool for wrapping lines. As well as splitting lines at whitespace when they become too long, it will re-flow paragraphs when the lines are too short.
$ cat >text <<EOF
Hello world.
I am text.
I need to be a paragraph.
EOF
$ fmt text
Hello world. I am text. I need to be a paragraph.
nl
nl(1) puts line numbers before each line in a file.
$ cat >text <<EOF
Hello
World
EOF
$ nl text
1 Hello
2 World
sort and uniq
sort(1) can be used to re-order the lines of a file based on various criteria, defaulting to ASCIIbetical.
$ cat >data <<EOF
2:Jonathan:Blue
1:Richard:Red
3:Zwingbor the terrible of planet Flarg:Putrescent Green
EOF
$ sort data
1:Richard:Red
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green
sort(1) can sort by field.
$ sort -t: -k3 data
2:Jonathan:Blue
3:Zwingbor the terrible of planet Flarg:Putrescent Green
1:Richard:Red
The sort order can be reversed.
$ sort -r data
3:Zwingbor the terrible of planet Flarg:Putrescent Green
2:Jonathan:Blue
1:Richard:Red
uniq(1) removes duplicate lines. Its algorithm expects the data to be sorted, since it removes consequtive, idential lines.
$ cat >data <<EOF
1
1
1
2
3
EOF
$ uniq data
1
2
3
Since data is rarely sorted, this usually means that the command you
need to run is sort data | uniq
.
cat >data <<EOF
1
2
1
5
6
1
2
5
EOF
$ sort data | uniq
1
2
5
6
However, since this is such a common operation, and executing a separate
subprocess would be wasteful, GNU's sort(1) accepts a -u
paramter
which does this.
$ sort -u data
1
2
5
6
comm
comm(1) will tell you which lines are _comm_on between two sorted files.
$ cat >file1 <<EOF
1
2
3
EOF
$ cat >file2 <<EOF
2
3
4
EOF
$ comm file1 file2
1
2
3
4
The first column is lines unique to the first file, the second column is lines unique to the second, and the third is lines common to both.
The options of comm(1) are a little odd. You can pass -1
, -2
,
or -3
to remove that field. This is a bit of an odd decision, given
the common operation is to use the flags to get only one column you
were interested in. So you would pass -12
to only get lines that were
common to both files.
$ comm file1 file2
2
3
One of the most useful features of shells is the ability to write scripts. It should not come as a surprise to find that there are many commands that only make sense when used with scripts.
[
and test
test(1) and [
are synonymous, apart from the fact that the
last argument to [
must be ]
.
They are programs for evaluating various expressions, usually used in shell flow control statements.
var=foo
if [ "$var" = foo ]; then
echo bar
fi
It is important to realise that [
is not a syntactical construct,
any command may be used in a shell conditional expression.
true
and false
Processes, as well as producing output, terminate with an "exit status". Sometimes the exit status is more important than the output. Occasionally the exit status is all you want.
Since the only two exit status codes that people generally care about are zero (ok) and non-zero (error), the true(1) and false(1) commands were written.
One major use for this is flow control in shell scripts.
if complicated and expensive command; then
ok=true
else
ok=false
fi
if "$ok"; then
echo I am ok
fi
Without true(1) or false(1) you would need to come up with your own convention for booleans. You would then need to use test(1) to check them, something like this:
if complicated and expensive command; then
ok=1
else
ok=0
fi
if [ "$ok" = 1 ]; then
echo I am ok
fi
Another is setting a user's shell to /bin/false
so they cannot log in.
$ grep false /etc/passwd
messagebus:x:102:104::/var/run/dbus:/bin/false
A third is to replace a command that is no longer required, but other scripts expect to be able to run it.
For example, on old Linux systems, pre-devtmpfs and udev, there was a
script called /dev/MAKEDEV
. For compatibility, some systems keep the
script as a symlink to /bin/true
.
A final use, is when you need to have a command in that context.
For example, bash has variable substitution options that will set a variable to a default value. They need to be run as part of another command, or they will be interpreted as a command.
$ true ${var=default}
To make this less confusing, bash introduced :
as a synonim for
true(1).
$ : ${var=default}
basename
and dirname
basename(1) and dirname(1) are used for manipulating file paths.
$ var=/path/to/foo/bar-file.txt
$ # <--dirname--><-basename->
$ # suffix-^--^
$ dirname "$var"
/path/to/foo/
$ basename "$var"
bar-file.txt
$ basename "$var" .txt
bar-file
They are mostly wrappers around the C library's basename(3) and dirname(3) functions, with the exception of basename(1) also taking a suffix parameter, which strips the suffix off the end of the file path if provided.
seq
and yes
seq(1) and yes(1) both produce lines of output based on their parameters.
seq(1) produces lines of integers in the provided range.
$ seq 3
1
2
3
$ seq 3 5
3
4
5
$ seq 10 10 30
10
20
30
$ seq 100 -20 60
100
80
60
This can be useful for loop conditions.
$ for i in `seq 1 3`; do echo "hello $i"; done
hello 1
hello 2
hello 3
Though in this case, bash brace expansion is more effective.
$ for i in {1..3}; do echo "hello $i"; done
hello 1
hello 2
hello 3
yes(1) prints the same line endlessly. You could naievely implement
it as yes(){ echo "${1-y}"; };
This may not sound useful for anything more than heating the area immediately around your computer, but there are often commands that ask for confirmation over standard input.
This may be an installer script, where the defaults are fine, or you
may have a version of cp(1) that doesn't have a --no-clobber
or
--force
option, but does have an --interactive
option, which can
be simulated with yes(1) as yes n | cp -i "$source" "$dest"
and
yes | cp -i "$source" "$dest"
respectively.
timeout
and nohup
timeout(1) and nohup(1) alter a command's termination conditions.
timeout(1) can be used to end a process after a given period of time. For example, you can work out how many lines yes(1) can print in 1 second with:
$ timeout 1s yes | wc -l
8710114
nohup(1) will stop a process being killed when the terminal it runs
from is closed. This is not demonstrable in the medium I have chosen to
demonstrate commands with, but it is run as nohup COMMAND >output
.
If the output of this command is not redirected away from the terminal,
then it is written to nohup.out
.
This is useful if you have a long-running job that you do not want to end when you disconnect from the server you are running it on.
However, screen(1) is a better choice for this task, as it gives
you a terminal you can detach from by pressing ^A-D
and reattach later
with the screen -x
command.
sleep
sleep(1) exits after the specified period. You can use it to implement timeout(1) as:
timeout(){
time="$1"
shift
"$@"&
pid="$!"
sleep "$time"
kill "$pid"
}
It is also useful in a retry loop to avoid busy-waiting.
while ! ssh some-server; do
sleep 10
done
tr
tr(1) is used to replace or delete characters in a stream.
For example, this will make all characters upper-case.
$ echo hello world | tr '[:lower:]' '[:upper:]'
HELLO WORLD
The -d
flag can be used to remove characters from a stream. This will
remove all vowels:
$ echo hello world | tr -d 'aeiou'
hll wrld
printf
printf(1) is mostly a thin wrapper over printf(3). Look to that for most of the details.
One addition worth mentioning is that bash has a built-in version,
which also supports %q
similarly to %s
, but quoted so you can use
it to safely generate scripts.
$ printf 'echo %q %q' "foo bar" "baz"
echo foo\ bar baz
tee
tee(1) writes its standard input to its standard output and any files listed on the command line.
$ echo foo bar | tee out
foo bar
$ cat out
foo bar
It is also usable as a way to write the output of a command somewhere you don't usually have permission to write, in conjunction with sudo(1).
$ echo foo bar | sudo tee out >/dev/null
When participating in free software development, sometimes people forget they're talking with living, feeling beings, and behave quite badly. Don't do that. Never forget there's a person behind a nickname or an e-mail address. Don't be a jerk. Be a mensch.
See also:
I've wanted to share a random selection of my browser history with you, the readers of Yakking, for quite some time now. I have been sat at my computer for nearly an hour without coming up with something technical to write, so I shall do just that. Remember people, sometimes you don't need to do something technical to become a better hacker.
I've recently been spending a lot of time reading about actigraphy because I've been working on getting my new Pebble smartwatch to have the software needed to replace my Jawbone Up. This has left me reading a number of papers such as Ryan Libby's paper A simple method for reliable footstep detection on embedded sensor platforms and also articles about sleep (particularly non-rem sleep) and how other methods such as measuring respiratory effort can help characterise sleep.
It seems that before then I spent a lot of time reading things about Snowden and the secret trials in the UK; mixed in with spending a while on Charles Stross' wonderful website (particularly his article about software to detect sarcasm in tweets). I was also spending time reading about UX design such as onboarding teardowns and how sometimes when programmers want to use a new tool because they encountered a problem, the problem is not the tool itself. Finally in that bunch I had a jolly good boggle at how someone copyrighted pi proving that idiocy exists everywhere there's an opportunity to make money.
Other than that, my time working on things with the NetSurf Browser Project left me reading about CSS cascade rules (again), Internationalised Domain Names (including the spec) and also hitting the (English) Wikipedia Random Page link a good few times
I think I'll leave this here and hope that I've not scared you all away. I hope to see you soon on Yakking and would encourage you to share a few interesting links from your recent browser history below in the comments for others to marvel over.
(P.S. some scary people such as Tony Finch have linklogs which are worth a flick through. I'd love to know if you all would like us to start a Yakking linklog)