File management

The various graphical environments have a file browser for displaying and manipulating your files. GNOME has Nautilus, Xfce has Thunar, Windows has Explorer.

However, even if you have a graphical desktop available, it can be useful to learn how to manage your files from the command line, since:

  1. You can use your knowledge of these shell commands to write scripts to automate frequent tasks.
  2. If you're quick on the keyboard, it is faster than the GUI tools.
  3. You may not have a graphical environment available, such as administrating a remote server, your graphical environment is broken and you need to fix it, or even your operating system won't boot further than your initramfs, so you're stuck in the rescue shell.

unlink, rmdir and rm

unlink(1) and rm(1) are used for removing files, rm(1) and rmdir(1) are used for removing directories.

The reason for this split is that rmdir(1) and unlink(1) map to the underlying system calls: rmdir(2) and unlink(2), while rm(1) has extra features, such as recursive deletion with rm -rf.

All of these can be given multiple files or directories to remove on the command line.

link and ln

File-systems can contain links to files, so the same file can be referred to by different names.

link(1) creates what is called a hard-link, where two file names refer to the same file, and writes to one alter the other.

ln(1) can also do this, but allows extra options, such as -f to remove the target if it already exists, and -s, which creates a symbolic link, rather than a hard link.

cp, install and mv

cp(1) and install(1) copy files to another location. mv(1) will remove the source file after it has been copied.

install(1) differs from cp(1) by defaulting to creating files with the executable bit set, and can take -m as an option to set the file mode while copying.

cp(1) is the standard tool for copying files. It can take -r to copy a whole directory tree, or -l to create a hardlink. They can be combined to create a quick copy of a whole directory tree.

mv(1) exists to paper over the difference between renaming a file on one file system, which is a quick operation with one invocation of the rename(2) system call, or moving a file to another file system, which requires copying the file to the destination, then removing it at the source.

cp(1) and mv(1) will be the most commonly used, as install(1), is as it name implies, is involved in installing software.

install(1) does have special uses for testing and providing instructions that can be pasted into a terminal, as you can create a file, initialize its contents, then set its permissions all in one command.

$ install -D -m 755 /dev/stdin /tmp/yakking/test-script.sh <<'EOF'
#!/bin/sh
echo "Hello World!"
EOF
$ /tmp/yakking/test-script.sh
Hello World!

Directory management

mkdir(1), as the name implies after adding more letters, makes directories.

Unadorned, mkdir(1) is a thin wrapper over the mkdir(2) system call, however the shell command will also accept a -p parameter, which will create all leading directories up to the specified directory too, allowing a nested directory tree to be created in one command.

$ mkdir -p /tmp/parent/directory/leaf
$ cd /tmp/parent/directory
$ ls
leaf

rmdir(1) likewise removes a directory. It will only remove a directory if it is empty. For directories that are not empty, use rm -r.

mktemp(1) creates a temporary file for you and prints to stdout where it created it. This is mostly useful for scripts.

$ tf="$(mktemp)"
$ echo "Hello World" >"$tf"
$ cat "$tf"
Hello World
$ rm "$tf"

mktemp -d creates a temporary directory. This is handy for creating temporary mount points, or just if you need a bunch of temporary files.

$ td="$(mktemp -d)"
$ echo Hello >"$td/hello"
$ echo World >"$td/world"
$ cat "$td/hello" "$td/world"
Hello World
$ unlink "$td"/*
$ rmdir "$td"

Metadata alteration

Permissions were briefly mentioned when describing install(1). chmod(1) does the same thing as the -m flag, but allows for the symbolic [ugo][+-=][rwx] form, as described in a previous article.

The u in the symbolic mode corresponds to the user that owns the file, and the g for the group. chown(1) and chgrp(1) may be used to change these fields respectively.

touch(1) may be used to update the modification and last-read times, called mtime and atime respectively. -d $TIME sets the time to update the file to. If it is not specified, the current time is used.

File creation and resising

touch(1) is for updating the access times of a file, but it is most commonly used for creating files, as if the file specified does not exist, it is created.

truncate(1) and fallocate(1) perform a similar purpose, ensuring a file is of a certain size. The difference is that fallocate(1) is a thin wrapper over the fallocate(2) or posix_fallocate(3) system calls, which may fail on some file systems, while truncate(1) works more reliably.

The size to set a file to with truncate(1) is specified with -s, and its argument, if prefixed with a + or a -, will instead grow or shring the file by the specified number of bytes.

fallocate(1) is instead specified with o to specify where the file begins, and -l to specify how long the file should be. It will shrink the file if -l is smaller than the file and -n is not specified.

truncate(1) is useful for resising disk images.

$ truncate -s +10GB rootfs.img
Posted Wed Mar 5 12:00:09 2014

This series of articles (The Truisms) is aimed at imparting some of the things we find to hold true no matter the project one undertakes in the space carved out by our interests in Open Source and Free Software engineering. The first article in the series was If you don't know why you're doing it, you shouldn't be doing it.

Many software developers, particularly those in open source projects, think that testing is somehow a dirty word. It's interesting that in commercial engagements it's often also not considered strongly worthwhile because there's a cost associated with it which is very visible, whereas the benefit of it is often much less obvious to the casual observer.

Testing, at whatever level you feel you can achieve, is always worthwhile. This is perhaps a controversial statement but I feel confident that I can support it. I have never been in a situation where I was unhappy that there were tests in a codebase. Sometimes they've been overly restrictive and made it hard to change how code behaves, but if the test author deliberately wanted that then they've served their purpose admirably.

There's plenty of different ways to test software. For some projects the easiest and most effective way to test is to have a human being use the software produced. For some it's possible to automatically test the outcomes of the project using tools such as yarn. Sometimes the best way to test code is at the unit-test level.

What's important is that no matter how you want to test your code (even if it's entirely by human), you need to have your tests written down in some manner which ensures consistency from test run to test run. For human-driven testing that involves ensuring that the test scripts are clear, not specific to particular computers and describe how to indicate where results deviate from expected ones. For automated tests that involves ensuring that the system under test is not affected by outside influences overly. Indeed ensuring that any external interfaces are mocked is a good way to ensure that a piece of software works in isolation from things which could affect the results adversely.

It's worth noting though that some things cannot be tested in isolation and need a wider set of systems available to test them effectively. So long as that wider set of systems is consistent and reproduceable, anyone can run your tests and satisfy themselves of the correctness of your project.

Anything you have not covered during testing is a potential thing which will cease to work and thus a candidate for removal from your project. Sometimes you cannot cover every line of code (particularly for covering defensive programming against unusual environmental errors) so exercise discretion here. But the big take-away from this is that if you are not systematically testing your software, bugs can and will creep in and bite you when you're not looking.

I could type in reams of examples of how testing has saved me, but I'm sure you don't want to read about those, so instead I shall set your homework for today. Find a project you have which is not properly tested and write some tests. Even if the tests all pass because you'd written perfect code before, you'll now be more confident that you can make changes to that codebase in the future and be able to know if you break something.

Posted Wed Mar 12 12:00:08 2014 Tags:
Lars Wirzenius Diffs and patches

The diff tool produces output that shows the differences between two files:

$ diff dir1/foo dir2/foo
1c1
< foo1
---
> foo2
$ 

This is a simple format that specifies that line one in the first file (dir1/foo) gets changed to become line 1 in the second file. Depending on the contents of the two files, there might also be deletions and additions.

diff output is a concise way to show to a human what has been changed. For example, you might have shown one version of your code to a friend, and then made some changes. Instead of having the friend re-read all the code, they can just read the diff output.

Sometimes the friend is yourself in the future, wondering what in all that is precious did you actually do to the code today.

The default output form of diff is not very nice to read, for a human. Luckily, diff can output has several variations (see diff formats), and the "unified context diff" (diff -u) is very popular, and quite easy for humans to read. The default format is the default because of backwards compatibility, possibly all the way back to the 1970s, when diff was originally written for Unix.

Here is a real example of unified context diff output:

diff --git a/debian/changelog b/debian/changelog
index 9ef157b..7e5ef03 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,10 +1,11 @@
 obnam (1.8-1) UNRELEASED; urgency=low

   * New upstream release.
-  * Fix "typo in debian/control: s/Uploader/Uploaders/" <explain what
-    you changed and why> (Closes: #729347)
+    * Fix "obnam mount fails to retrieve files over 64KB (obnam restore
+      works fine)" (Closes: #741968)
+  * Fix "typo in debian/control: s/Uploader/Uploaders/" (Closes: #729347)

- -- Lars Wirzenius <liw@liw.fi>  Sun, 16 Mar 2014 12:47:32 +0000
+ -- Lars Wirzenius <liw@liw.fi>  Tue, 18 Mar 2014 08:42:28 +0000

 obnam (1.7-1) unstable; urgency=low

This shows that the file debian/changelog got changed, and which lines got changed. Lines that start with "-" indicate deletions; those with "+" indicate additions. The other lines are there for context, making it easier for a human to understand what was actually changed.

Read more details in the diff Wikipedia page. You rarely need to know the precise, minute details of the format. The overall gist is easy to get, and that's good, because you will be seeing these diff outputs all the time.

If you need to send a diff to someone else, always use the unified context diff format. You might send it to someone to show how you've fixed a bug in their software, for example. Because a unified diff can be applied even after other parts of the file have changed, it is a very convenient form of exchanging changes, or sets of changes, between programmers.

For example, the above example isn't actually from diff itself. Instead, it's from git log -p, which shows a unified diff with the changes for each commit.

The patch utility reads diff output and makes the changes in the second file. In this sense it is the reverse of diff. The output of diff is thus often called a patch as well, especially when the diff is sent to someone to be applied with patch. Instead of patch, the patch may also be applied with a suitable tool in the version control system being used, such as git am.

Posted Wed Mar 19 12:00:18 2014 Tags:

This series of articles (The Truisms) is aimed at imparting some of the things we find to hold true no matter the project one undertakes in the space carved out by our interests in Open Source and Free Software engineering. The first article in the series was If you don't know why you're doing it, you shouldn't be doing it.

It might seem a little unusual to be talking about making backups in the middle of a series of articles about developing free software, but bear with me and you'll see why I'm going to spend this week's slot talking about ensuring you won't lose your data or code unexpectedly.

When we spoke about revision control a few weeks ago, I mentioned that if nothing else, putting your code into revision control and pushing it somewhere would mean that you had two copies of the code. This is, in a crude sense, a backup of sorts. Couple it with making regular commits and pushing to more than one location and you could claim it was even a good backup strategy for your source. Indeed, I believe there are people out there who truly believe that this is enough.

I believe that ensuring you don't lose your code is only one part of ensuring that your software development life won't stutter to a halt if you lose a hard drive. There are so many more things on your development system which deserve protection from disaster. For example, your dotfiles (the files which configure your development software such as .gitconfig or .bashrc) -- are they somehow less worthy of preservation than your code? Also the set of packages your have installed, or the arrangement of icons on your desktop. All these are pieces of data which you would be less efficient if you had to recreate.

Also, previously I suggested you should ensure you had pushed your work elsewhere in order to help preserve it. The server you pushed it to -- is that safe from the ravages of disk failure too? From this therefore comes the need to keep backups. Make them often, make them well, and as with my previous article, test them. If you don't know you can restore from them, then your backups are nothing more than pretty piles of otherwise worthless data.

Your homework for this week is to go and find something you don't keep regularly backed up and think "If I were to rm -rf this, right now, would I be upset?" If the answer is "no" then delete it. If the answer is "yes" then find some way to change it to a "no", perhaps by deriving, implementing and verifying a backup strategy for it.

Posted Wed Mar 26 13:00:11 2014 Tags: