Unix Operating System tools

dd

dd(1) stands for disk duplicator, though it is often humourously called disk destroyer, since misuse can mean total loss of a disk's data.

It is an old tool. It pre-dates the convention of starting options with a -, so the command to make a backup of an attached disk is:

$ dd if=/dev/sdb of=disk-backup.img

if=/dev/sdb specifies that the input file is /dev/sdb which is the second disk attached to the computer.

of=disk-backup.img says that the contents of the disk should be written to disk-backup.img.

Another use of dd(1) is creating large files. if=/dev/urandom lets you create a file with random contents; if=/dev/zero lets you create a file filled with zeros.

Just substituting the if= with the previous command would result in a command that will fill your filesystem. To limit the amount of data copied, specify bs= and count=. bs= specifies the buffer size, which is how much data to read before starting to write. count= specifies how many full buffers to copy, so the amount of data to copy is the product of the two.

With that in mind, the following command makes a 512 MiB file full of random data, in blocks of 1KiB.

$ dd if=/dev/urandom of=random-file bs=1024 count=524288

The following will create a file full of zeroes.

$ dd if=/dev/zero of=random-file bs=1024 count=524288

dd(1) also supports the seek= option, which can be used to start writing data at a given offset to the file. This could be useful for writing to a partition elsewhere on the disk, or creating a sparse file.

This command writes the disk image, skipping the first block of the disk.

$ dd if=disk.img of=/dev/sdb seek=1

This command creates a 1GiB sparse file on file-systems that support it.

$ dd if=/dev/zero bs=1 of=sparse-file \
      seek=$(( (1024 * 1024 * 1024) - 1 )) count=1

The intent of truncate -s 1GiB sparse-file is clearer.

shred

Unlike dd which is only affectionately known as Data Destroyer, shred(1) is supposed to do this.

This writes random data in-place to attempt to replace the file contents on disk, so its data cannot be recovered.

Unfortunately, file-systems are not so simple that this works any more. Journalled file systems like ext4 may have a copy in the journal, and CoW file-systems like btrfs may have multiple copies around.

Partly this is a result of an evolutionary arms race against storage devices by file-systems to do their very best to not lose data. For this reason, I would recommend using shred(1) on the device you believe the file-system to be backed by, or if you're feeling particularly paranoid, remove the drive from the machine and physically shred it into as may pieces as you feel comfortable with..

sync

There are various layers of caching involved when writing data to a hard-disk, to provide better throughput.

If you're using the C standard API, fflush(3) is the first thing you need to do. This will write any data that is being buffered by the C library to the VFS layer.

This just guarantees that any reads or writes performed by talking to your operating system will see that version of the data. It will be cached in memory until a convenient time when it can be written out again.

It has not yet made it onto the disk, to ensure this, you need to call one of the sync system calls.

These give as good a guarantee as you can get, that if you were to suddenly lose power, your data would be on the disk.

It is not possible to directly sync the writes associated with the file descriptor from shell, but you can use the sync(1) command, which will do its best with every disk.

Of course, none of this can actually guarantee that your data is safely written to the disk, as it may lie about the writes having made it to disk and cache it for better throughput, and unless it can guarantee with some form of internal power supply that it will have finished writes before it loses power, then your data will be lost.

uname

uname(1) is an old interface for telling you more about your operating system. Common uses are uname -r to say which version you are using, uname -m says which machine you are running on, and uname -a does its best to show you everything.

For example, since I use a debian chroot on an Android tablet, I get the following:

$ uname -a
Linux localhost 3.1.10-gea45494 #1 SMP PREEMPT Wed May 22 20:26:32 EST 2013 armv7l GNU/Linux

mknod and mkfifo

mknod(1) and mkfifo(1) create special files. /dev/null is one such a file.

mkfifo(1) is a special case of mknod(1), creating a device node with well known properties, while mknod(1) is capable of creating arbitrary device nodes, which may be backed by any physical or virtual device provided by the Kernel.

Because of this it is a privileged operation, as you can bypass the permissions imposed on the devices in /dev.

Modern Linux systems use a special file-system called devtmpfs to provide device nodes, rather than requiring people to use mknod(1) to populate /dev.

mkfifo(1) is useful for shell scripts though, when there are complicated control flows that can't be easily expressed with a pipeline.

The following script will pipe ls through cat, and tell you the exit status of both commands without having to rely on bash's PIPESTATUS array.

td="$(mktemp -d)"
mkfifo "$td/named-pipe"
ls >"$td/named-pipe" & # start ls in background, writing to pipe
lspid="$!"
cat <"$td/named-pipe" & # read pipe contents from pipe
# you may start getting ls output to your terminal now
catpid="$!"
wait "$lspid"
echo ls exited "$?"
wait "$catpid"
echo cat exited "$?"
rm -rf "$td"

df and du

df(1) displays the amount of space used by all your file systems. It can be given a path, at which point it will try to give you an appropriate value for what it thinks is the file system mounted there.

Modern Linux systems can have very complicated file systems though, so it may not always give correct results. df(1) for example, can give incorrect results for btrfs, where there's not a 1 to 1 mapping between disks and file-system usage, and is not smart about things like bind-mounts and mount namespaces, so smarter tools like [findmnt(1)][] from util-linux are required.

du(1) attempts to inform you of how much disk space is used for a specific file-system tree, so du -s . tells you how much space your current directory is using.

du -s / is unlikely to correspond with the same number provided by df /, because there are metadata overheads required, so on normal file-systems the result of df(1) is likely to be larger.

btrfs can also produce different results, since it can share data between files.

chroot

chroot(1) is a useful command for allowing programs with different userland requirements work on the same computer, assuming you don't need too much security.

It changes a process' view of the file-system to start at a different point, so you can hypothetically use it to restrict access to certain resources.

There are various ways of escaping a chroot, so it's insufficient to protect your system from untrusted programs.

Containers or virtual machines are more secure, but don't have a standard interface.

Chroots are still useful for trusted programs though. For example, I run debian squeeze on my Android tablet to do my blogging.

nice

nice(1) can be used to hint to your operating system's scheduler that a process requires more or less resources.

This can't be used to prevent a mis-behaving process from consuming more than its fair share of CPU, since it could always fork further worker processes.

Linux attempts to handle this deficiency with cgroups.