Informational shell utilities

Primitives

echo(1) will print its command line out. Despite its simple behaviour, it is remarkably useful, since it's a way to get your shell to put its variables to its standard output.

$ VAR=foo
$ echo "$VAR"
foo

Given its frequent use and simplicity, the version of echo(1) you use will actually be the version built into your shell.

It can be used to write strings to a file, when used with output redirection.

$ echo foo >bar

cat(1) can be used to retrieve the contents of a file.

$ cat bar
foo

You can put the contents of a file into a variable using subshell expansion.

$ baz=`cat bar`
$ echo $baz
foo
$ qux=$(cat bar)
$ echo $qux
foo

So with echo(1), cat(1), output redirection and subshell expansion, you can move data between variables, command line arguments and files.

cat(1) has one last trick up its sleeve; it can be used to concatenate files, hence its name.

$ echo foo >1
$ echo bar >2
$ cat 1 2
foo
bar

File trimming

head(1) and tail(1), without further arguments, read the first 10 or last 10 lines of a file and write them to standard output.

$ seq 20 >numbers # write the numbers 1 to 20 to the file called numbers
$ head numbers
1
2
3
4
5
6
7
8
9
10
$ tail numbers
11
12
13
14
15
16
17
18
19
20

The number of lines can be tuned with the -n flag. So head -n3 is the first 3 lines, and tail -n3 is the last 3 lines.

$ head -n3 numbers
1
2
3
$ tail -n3 numbers
18
19
20

The number given to the -n option of head(1) can be prefixed with a - to print all but the last n lines, and tail(1) can be prefixed with a + to print all but the first n lines.

$ seq 5 >numbers
$ head -n -1
1
2
3
4
$ tail -n +1
2
3
4
5

The option -c can be used to perform a similar operation, but on the number of bytes rather than the number of lines.

tail(1) has a few more tricks. One of tail's key uses is to display the last lines of a log, which are generally more important to know what just happened to the program that generated it.

However, it's also useful to have the output from the log be printed to an open terminal, to see what it's doing, without having to reconfigure it to log differently, so tail(1) also has the -f flag, which stands for follow. When -f is specified, it will print the last 10 lines, but also print new lines as they are added.

Pagers

less(1) and more(1) are pagers. Pagers are responsible for allowing you to more easily read the output of a program that generates a lot, by keeping it all on one page and letting you say when you've finish reading it and continue to the next page.

less(1) is like more(1), but better. If you're interested, someone also wrote a pager called most(1).

Please note that cat "$file" | less is rarely the right thing to do. Unless $file is special in some way this is equivalent to less <"$file", and it's clearer, in almost all cases, to simply run less "$file".

Environment processing

Both env(1) and printenv(1) can be used to list all the variables. Indeed they behave the same if they are called with no arguments.

However, if you are writing a shell script and need to programmatically get a list of all the environment variables, and maybe their values, then you need the GNU coreutils versions of these programs, and then you can use use the -0 switch to get them to NUL terminate output, rather than newline terminate, allowing you to process them with read -d ''.

After this, env(1) and printenv(1) differ, since printenv(1) will list variables you list on the command line

$ export FOO=bar
$ export BAZ=qux
$ printenv FOO BAZ
bar
qux

However, this is of limited use, since your shell will suport variable interpolation natively.

$ echo $FOO $BAZ
bar qux

env(1) will set variables then run a command.

$ env BADGER=stoat printenv BADGER
stoat

env(1) can be used to run a program in a reduced environment with the -i flag.

$ env -i BADGER=stoat printenv
BADGER=stoat

Apart from all this, env(1) is used on the she-bang line of scripts, to allow looking up the interpreter in $PATH, since various programs may be installed in different places on different systems, but env(1) is usually installed at /usr/bin/env.

Directory contents information

ls

ls(1) can be used to list the contents of a directory.

$ ls /
bin    dev   initrd.img      lib64  opt   run   sys  var
boot   etc   initrd.img.old  media  proc  sbin  tmp  vmlinuz
cdrom  home  lib             mnt    root  srv   usr  vmlinuz.old

If it can determine that it is writing to a terminal, it will tabulate the results like above, otherwise it is one entry per line.

$ ls / | cat
bin
boot
cdrom
dev
etc
home
initrd.img
initrd.img.old
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
vmlinuz
vmlinuz.old

It's common to see shell scripts use ls(1) to perform an action on every entry in a directory.

$ for dir in `ls /`; do
    echo $dir
done

However, this would have problems if any of the directory names had spaces in them, and shell globbing is shorter, and handles spaces, just not any files or directories with names that start with ..

$ for dir in /*; do
    echo $dir
done

You can use ls -a to list files that start with ., though ls -A is generally more useful since it excludes . and ...

To do this safely in scripts is more complicated.

A portable way to do this is:

$ command='for arg; do echo "$arg"; done'
$ find / -mindepth 1 -maxdepth 1 -exec sh -c "$command" - {} +
/home
/etc
/media
/var
/bin
/boot
/dev
...

A less portable way that uses fewer subprocesses is:

$ find / -mindepth 1 -maxdepth 1 -print0 |
while read -d '' arg
do
    echo "$arg"
done
/home
/etc
/media
/var
/bin
/boot
/dev
/lib
/mnt
/opt
...

If your loop needs to change variables outside the body of the loop, you need to do this instead.

$ count=0
$ while read -d '' arg
do
    count=$(( "$count" + 1 ))
done < <(find / -mindepth 1 -maxdepth 1 -print0)
$ echo $count
24

pwd

pwd(1) will print the path to the current directory to standard output.

$ pwd
/home/richardmaw

This is not generally useful when using a shell interactively, since your shell's prompt will usually have the current directory in it:

$ PS1='\w\$ '
~$ cd /
/$ pwd
/

However, pwd can be of use in scripting if you need to change directory and wish to return to the current directory later in your shell script.

#!/bin/sh
HERE="$(pwd)"
# Do work including changing directory
# ...
cd "$HERE"
# Do more work back in the original directory
# ...

readlink

readlink(1) is a thin wrapper over the readlink(2) system call in its default mode of operation.

$ readlink /vmlinuz
boot/vmlinuz-3.11.0-17-generic

It is generally more common to discover this by running ls -l, since it's fewer characters to type, though readlink(1) is more useful in scripts.

$ ls -l /vmlinuz
lrwxrwxrwx 1 root root 30 Mar  3 08:07 /vmlinuz -> boot/vmlinuz-3.11.0-17-generic

Another useful mode of readlink(1) is readlink -f, which is "canonicalize" mode.

$ readlink -f /vmlinuz
/boot/vmlinuz-3.11.0-17-generic
$ cd ~
$ readlink -f ../../vmlinuz
/boot/vmlinuz-3.11.0-17-generic

Another use for readlink -f is to get an absolute path to a shell script, to access packaged resources.

$ install -m755 /dev/stdin /tmp/thisdir-test.sh <<'EOF'
#!/bin/sh
THISDIR="$(readlink -f "$(dirname "$0")")"
echo "$THISDIR"
EOF
$ /tmp/thisdir-test.sh
/tmp
$ mv /tmp/thisdir-test.sh ~
$ ~/thisdir-test.sh 
/home/richardmaw

stat

stat(1) uses the stat(2) system call to report information about files.

It defaults to showing a wide selection of information.

$ stat thisdir-test.sh 
  File: ‘thisdir-test.sh’
  Size: 69          Blocks: 8          IO Block: 4096   regular file
Device: 1ah/26d Inode: 237403      Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/richardmaw)   Gid: ( 1000/richardmaw)
Access: 2014-03-30 20:50:10.266400076 +0100
Modify: 2014-03-30 20:49:01.786402099 +0100
Change: 2014-03-30 20:49:42.522400896 +0100
 Birth: -

stat(1) can be told to output specific information with its -c argument, which takes a format string. %n is the file name, %s is the file size.

$ stat -c '%n: %s' thisdir-test.sh 
thisdir-test.sh: 69

User status

whoami(1) prints the user name of the current user.

$ whoami
richardmaw

who(1) and users(1) list who is logged in, in slightly different formats.

$ users
richardmaw richardmaw richardmaw richardmaw
$ who
richardmaw tty1         2014-03-30 15:49
richardmaw tty7         2014-03-30 18:06 (:0)
richardmaw pts/1        2014-03-30 18:13 (:0)
richardmaw pts/2        2014-03-30 20:10 (:0)

id(1) displays more information about the current user, including the primary group, supplementary groups, and all the numeric ids for them, all comma separated

$ id | while read -d ',' FOO; do echo $FOO; done                                                                                                   ~
uid=1000(richardmaw) gid=1000(richardmaw) groups=1000(richardmaw)
4(adm)
20(dialout)
24(cdrom)
25(floppy)
27(sudo)
29(audio)
30(dip)
44(video)
46(plugdev)
104(scanner)
113(netdev)
114(bluetooth)
120(fuse)

id(1) can be given various options to limit its output, such as -Z to also print the SELinux security context.

date

date(1) prints the current date and time to standard output.

$ date
Sun Mar 30 21:36:16 BST 2014

It also takes a format string.

$ date +%F
2014-03-30

It can be told to format a date which isn't now, with the -d option.

$ date -d yesterday
Sat Mar 29 20:37:58 GMT 2014

wc

wc(1) prints line, word and character counts. By default it prints all 3 and the file name.

$ wc thisdir-test.sh 
 3  7 69 thisdir-test.sh

To print only one of those, specify -l, -w and -c respectively.

$ wc -l thisdir-test.sh 
3 thisdir-test.sh

To prevent wc(1) printing the file name, the file has to be provided as the standard input.

$ wc -l <thisdir-test.sh 
3

A common solution to finding out how many files in a directory is ls $dir | wc -l, though this will give incorrect results if the file names have new line characters in them.

The most concise, correct way to do this is:

$ find / -maxdepth 1 -mindepth 1 -print0 | grep -cz .
24
Posted Wed Apr 2 11:00:11 2014 Tags:

This series of articles (The Truisms) is aimed at imparting some of the things we find to hold true no matter the project one undertakes in the space carved out by our interests in Open Source and Free Software engineering. The first article in the series was If you don't know why you're doing it, you shouldn't be doing it

Thus-far we've talked about core infrastructural and social needs for a project, such as knowing what you want to do and why you want to do it, ensuring you keep track of what you did, and ensuring you won't lose it if the worst happens. We've also talked about verifying that what you've done actually works, so today let's talk about knowing how well it works.


The desire to make something faster, smaller, more efficient, is a common failing among programmers. Yes; I did say "failing". It's a failing because for the most part, programmers don't actually know for certain that what they want to make better is worth their time to do so. Of course, you might have a goal of making something more efficient as an exercise in optimisation, or making something more capable as an exercise in increasing the functionality of a piece of software; but for the most part these efforts are usually premature because programmers really love to work on the code and very rarely do they actually want to step back and measure.

It's not always possible to measure objectively the level of functionality of a piece of code. There's always extra things you can think of which it "would be nice to have" but if you can't point at a need to have them, a requirement if you will, then perhaps you don't need them. Making something faster is only worth doing if what you have now does not meet some requirement for rapidity. So let's assume for a moment that you know that your program needs to go faster to meet a requirement; how do you determine where to focus your efforts?

There are a number of pieces of software out there to help you. Typically they are called profilers and exist at many levels. If your software is in a dynamic language such as Python or Lua, there'll be profilers at the language level for you to use. If on the other hand your software is written in something lower level such as C, there are profilers which work at the machine level to give you a deep understanding of where time is being spent in your code. Ultimately it's important to realise that without using tools such as profilers, to measure where time is being spent in your software, you will never be able to focus your efforts effectively on improving its performance.

A week or so ago, at work, I was looking at something I was convinced ought to be faster than it was. I ran several sets of tests through the software using a profiler and discovered that something I had previously assumed about the software was wrong. The profile led me to see that an operation I was convinced was being cached actually wasn't, and four extra lines of code later (four in a codebase of thousands and thousands) we had taken the complexity of a piece of code from O(nm) to O(n). Without that profiling effort, I'd have been looking in completely the wrong place for the quick fix to improve our performance.

Your homework for this week is to play with measuring the performance of a project of yours which perhaps doesn't behave as well as you'd hoped. Perhaps if you profile it well, you'll find a quick fix which will give you the satisfaction I got from writing those four lines of code. If all your own software is perfect (or meets its requirements at least) then why not look for something else to profile. You might be able to make a useful contribution to an open source project if you look hard enough. Remember that a useful contribution to a project could simply be the work to get the profiles made and the trouble spots isolated, even if you have no idea how to fix them yourself.

Posted Wed Apr 9 11:00:14 2014 Tags:
Lars Wirzenius Reporting bugs

You will encounter a bug in some softare some day in the future. You might ignore. You might fix it yourself. You might report it to the developers of the software.

There's several reasons for reporting bugs.

  • Other people might fix them.
  • Someone might know of a workaround.
  • You might learn why the problem you found isn't an actual bug.

Glory and fame and fortune are not reasons for reporting bugs, merely beneficial side-effects.

Many large projects have a page for how to report problems in a way that is best for that project. Some examples:

There are also some generic guides:

A quick summary, based on the author's experiences:

  • Check that no-one else has reported the problem already.
  • Explain what you did, what you expected to happen, and what really happened.
  • Do not rephrase: use copy-paste, screenshots, and file attachments to provide the actual output or files that happened.
  • If at all possible, describe a way to reproduce the problem. If you can get it reproduced with a shell script, even better.
  • Be patient. Many large projects receive lots of bug reports, on the order of tens of thousands per year. It may take a long time for them to react, and they may not be able to give all their attention to just your problem.

Reporting bugs can be a delightful experience. A reponsive development team appreciates your report, and fixes the problem immediately, and thanks you in their release notes. This can be a good route to start contributing to the project in general.

Reporting bugs can also be a frustrating, de-motivational experience. Your bug may be ignored, or closed without explanation, or you'll be told you did something wrong. Just interacting with the project's bug tracking system can ruin your day. If you interact with enough projects, you'll encounter these things. If you can, shrug it off and move on to another project.

Posted Wed Apr 16 11:00:07 2014 Tags:

This series of articles (The Truisms) is aimed at imparting some of the things we find to hold true no matter the project one undertakes in the space carved out by our interests in Open Source and Free Software engineering. The first article in the series was If you don't know why you're doing it, you shouldn't be doing it


Of all the truisms to date and planned, I would say that this is the only one actively intended to discourage you from writing that program you had the idea for the other day. This is because we have yet to discuss that critical aspect of software -- the people who use it. As many seasoned software authors will tell you, if you ask them, writing software would be so much easier if nobody ever wanted to use it afterwards.

It's often the case that people who write software wake up in the middle of the night (or lunch, or boring meeting about how everyone missed their targets this week), feverishly excited about a piece of software they just imagined which will change the world. Everyone will want to use it, it'll be amazing. It rarely is, and one of the best tests you can apply to your idea is simply: Would I want to use this? Obviously you can't apply this test to software you have to write for a living, but we're not on about that boring stuff here. We're talking about software you want to write for fun, for excitement, for the challenge, for yourself.

One aspect of this particular truism is that when writing software when you're not being paid for the effort is that what you write must be of value to someone. Back in the earliest part of this series we talked about how if you don't know why you're working on something, you shouldn't be doing it, and we gave a number of possible reasons for working on a project. The longer-term a project becomes, the more important it will be that at least one of those reasons is that it is solving a real problem for yourself. Otherwise you will lose interest in doing anything on the project. The someone to whom the project is of value really must include yourself at some level.

The other aspect is simply that in order for people to want to use something, you need to make it nice to use. Unless your target audience is significantly different to yourself, that means you need to find it nice to use. User experience is hard enough to work on at the best of times, but when you can't imagine yourself as the user it's even harder. Open source hackers often don't have the luxury of being able to pay someone else to tell them how something should work and as such the only user whose experience you can fully understand, and thus optimise for, is your own.

A corollary of this is simply that if you lose interest in a project you previously worked on, which is in use by other people (packaged in a distribution fr. ex.) then you really should find someone else to pass the mantle on to, for maintenance and development. I have been in this position several times and now there's a lot of code out there in the world, some of which you may even be using (deliberately or inadvertently) which I no longer assert maintainership over. I'm glad I wrote it, as are many others, but I'm equally glad I no longer work on it because I wouldn't want to use it any more.

Your homework for this week is to go over your current active projects, decide if there are any you no longer use, and if any of those are in use by others, find someone else to take on the mantle of maintainership. I'm not saying give them up entirely, but find someone who is interested in using the software and who you feel can do you proud in continuing to make it better software for all its users. Regardless of whether or not you find any projects which fit into that category, revisiting all your projects and deciding if any of them are now due to be mothballed can be a very cathartic experience, freeing a part of your brain you didn't even realise was consumed with thoughts about stuff you no longer actively use. Also you might find something you forgot which really solves a problem you were planning on writing that amazing program to solve, just the other night when you woke up in a cold sweat.

Posted Wed Apr 23 11:00:10 2014 Tags:

Why use temporary files?

A temporary file is a file that is created without wanting it to stay around forever.

Most uses of temporary files fall into 3 categories:

  1. You have a large amount of data, that you will need later in the program's lifetime, but not now, and it's sufficiently large that you don't want to keep it in-memory.

    Writing this out to a temporary file allows you to re-read the temporary file later to get the data back.

    This is needed less often these days, since common workloads will easily fit in 64-bits worth of memory, and operating systems are clever about doing this for you automatically, by writing data that isn't being used to a swap partition if memory is getting scarce.

  2. Using an API that takes a file.

    You have data that you want a library function, or another program, to process. However, your data is not in a file. So, to deal with this, you can create a file, write your data to it, and pass either a file descriptor or a file path to the library function or program.

    It is very common for this to be the use of temp files in shell.

  3. Saving partial results to a file, and renaming that file to the name of the file you want the final results to be saved.

    This allows you to make an atomicity guarantee that the final result is always complete.

Rather than my usual approach of demonstrating everything with shell scripts, this will be demonstrated with C, since higher level languages' abstractions can hide the important details this article is trying to teach.

Why shouldn't I just roll my own?

Convenience

If you want to make full use of temporary files, you will eventually need all the features provided by your platform's temporary file API anyway, and it tends to hide some of the tricky details that you would otherwise have to learn about and potentially implement yourself.

Security

Symlink attacks allow a local attacker to make you write files to a place you didn't intend. If your temporary file names can be guessed and can't handle the file already existing, you are vulnerable to this attack.

On its own it allows a denial of service attack by making you use up your disk quota, or trash a file you didn't intend to, but if a second vulnerability can be found to allow the attacker to choose what data is written there, then they can take over your user account.

If the vulnerable program was run as root, the attacker can control the whole system.

Temporary file locations

/tmp is the traditional and default location for temporary files. Your operating system will take a couple of steps to avoid these files piling up.

  1. Remove the contents of /tmp on start-up.

    This has the disadvantage of slowing down boot, and long-running systems can run out of disk space from accumulated temporary files.

  2. Mount a temporary file system at /tmp on start-up.

    Temporary file systems keep their files in memory (or write their contents to the swap partition if memory runs out), rather than writing their contents to a disk, so their contents are more likely to be in-memory than on-disk, which is good for small files.

    This has both the advantage and disadvantage of accounting for storage separately from you main disk, since it's an advantage that you aren't using your main storage for temporary files, but a disadvantage that you're more likely to run out of space with large temporary files.

There's a couple of ways to deal with this last problem.

/var/tmp is an oxymoron, since /var is for persistent state, and tmp is for temporary files, but it's conventionally used for large temporary files.

Making your program create temporary files there depends on the API you are using, the useful ones allow you to set a directory as one of the parameters, but if that doesn't work, setting the TMPDIR environment variable usually works.

C library API

file API

tmpfile(3) returns a FILE* of a file that will be removed when the FILE* is fclose(3)d or the program exits. This is very useful for use-case 1.

$ cat >tmpfile-example.c <<'EOF'
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv){
    FILE *fh = tmpfile();
    for (int i = 1; i < argc; i++){
        fprintf(fh, "Arg %d: %s\n", i, argv[i]);
    }
    rewind(fh);
    {
        char *buf = NULL;
        size_t memlen = 0;
        ssize_t bytes_read;
        while ((bytes_read = getline(&buf, &memlen, fh)) != -1){
            fwrite(buf, sizeof(buf[0]), bytes_read, stdout);
        }
    }
}
EOF
$ make CFLAGS=-std=c99 tmpfile-example tmpfile-example
$ ./tmpfile-example hello world
Arg 1: hello
Arg 2: world

Secure temporary file creation

The tmpfile(3) is perfect for use-case 1, but is not suitable for use-cases 2 or 3, which require the API to also give you a file descriptor or file path.

mkstemp(3) stands for "make secure temporary file"; mkdtemp(3) stands for "make temporary directory".

mkstemp(3) creates a temporary file based on the string template given, modifies it in-place so you can get the file path afterwards, and returns a file descriptor to the newly created and opened file.

$ cat >copy-to-temp-file.c <<'EOF'
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

/* Wrap mkstemp so directory and prefix can be passed
   Returns -errno and leaves *filename untouched on failure
   assigns *filename to malloced string and returns fd on success
 */
int mkstemp_with_prefix(char const *dir, char const *prefix, char **filename){
    int fd = -1;
    char *template = NULL;
    if (asprintf(&template, "%s/%sXXXXXX", dir, prefix) == -1){
        int err = -errno;
        perror("Alloc mkstemp template string");
        return err;
    }
    fd = mkstemp(template);
    if (fd == -1){
        int err = errno;
        perror("Make temporary file");
        free(template);
        return err;
    }
    *filename = template;
    return fd;
}

int main(int argc, char **argv){
    char *dir = ".";
    char *prefix = "tmp";
    char *filename;
    int fd;
    int ret = 0;
    switch(argc){
    case 3:
        prefix = argv[2];
    case 2:
        dir = argv[1];
    case 1:
        break;
    default:
        fprintf(stderr, "Usage: %s [DIR [PREFIX]]\n", argv[0]);
        return 1;
    }
    fd = mkstemp_with_prefix(dir, prefix, &filename);
    if (fd < 0){
        return 2;
    }
    while ((ret = splice(0, NULL, fd, NULL, 4096, 0)) > 0){
        /*no op*/;
    }
    if (ret == -1){
        perror("Copy file");
        return 3;
    }
    printf("%s\n", filename);
    return 0;
}
EOF
$ make copy-to-temp-file
$ tempfile=$(echo "Hello World" | ./copy-to-temp-file)
$ cat "$tempfile"
Hello World
$ rm "$tempfile"

mkdtemp(3) has a similar API to mkstemp(3), in that it takes a mutable string template and modifies it, but it does not return a file descriptor.

$ cat >split-file.c <<'EOF'
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>

/* Wrap mkdtemp so directory and prefix can be passed
   Returns -errno and leaves *filename untouched on failure
   assigns *filename to malloced string and returns 0 on success
 */
int mkdtemp_with_prefix(char const *dir, char const *prefix, char **filename){
    int ret = 0;
    char *template = NULL;
    if (asprintf(&template, "%s/%sXXXXXX", dir, prefix) == -1){
        ret = -errno;
        perror("Alloc mkdtemp template string");
        return ret;
    }
    if (!mkdtemp(template)){
        ret = -errno;
        perror("Make temporary directory");
        free(template);
        return ret;
    }
    *filename = template;
    return ret;
}

int write_n_lines(FILE *input, int n, FILE *output){
    int ret = 0;
    size_t n_alloced = 0;
    char *buffer = NULL;
    for (int i = 0; i < n; i++){
        ssize_t n_read;
        n_read = getline(&buffer, &n_alloced, input);
        if (n_read == -1){
            if (!feof(input)){
                perror("Read line");
                ret = 1;
            }
            break;
        }
        if (fputs(buffer, output) == EOF){
            perror("Write line");
            ret = 2;
            break;
        }
    }
    free(buffer);
    return ret;
}

int main(int argc, char **argv){
    char *dir = ".";
    char *prefix = "tmp";
    char *tempdir;
    unsigned lines_per_file;
    switch(argc){
    case 4:
        prefix = argv[3];
    case 3:
        dir = argv[2];
    case 2:
        lines_per_file = atoi(argv[1]);
        break;
    default:
        fprintf(stderr, "Usage: %s LINES_PER_FILE [DIR [PREFIX]]\n",
            argv[0]);
        return 1;
    }
    if (lines_per_file <= 0){
        fprintf(stderr, "Lines per file must be a positive integer\n");
        return 2;
    }
    if (mkdtemp_with_prefix(dir, prefix, &tempdir)){
        return 3;
    }
    printf("%s\n", tempdir);
    for (int i = 0; 1; i++){
        char *filename = NULL;
        FILE* fileobj;
        if (asprintf(&filename, "%s/%03d", tempdir, i) == -1){
            free(filename);
            perror("Formatting output file name");
        }

        fileobj = fopen(filename, "wx");
        if (fileobj == NULL){
            perror("Opening output file");
            free(filename);
            return 4;
        }

        if (write_n_lines(stdin, lines_per_file, fileobj)){
            fclose(fileobj);
            free(filename);
            return 5;
        }

        fclose(fileobj);
        free(filename);
        if (feof(stdin)){
            break;
        }
    }
    return 0;
}
EOF
$ make CFLAGS="-std=c99 -D_GNU_SOURCE" split-file
$ tempdir=$(seq 9 | ./split-file 3)
$ (cd "$tempdir" && ls)
0000  0001  0002  0003
$ find "$tempdir" -delete

It is safe to create files with fixed names in a temporary directory, since it is created with mode 700, which means only you are able to create files in there.

Temporary name generation

WARNING: You probably don't want to use these functions, since they are a security risk if used improperly, and the functions for making temporary files handle this complexity for you.

mktemp(3) and tempnam(3) return a file path that theoretically could be used for a temporary file.

The former generates its name from default settings and the value of the TMPDIR environment variable, while the latter lets you specify a directory and a prefix.

The manual pages for these explicitly say that you shouldn't be using these, and you should instead use mkstemp(3) or mkdtemp(3).

They are safe to use if you handle not creating the directory entry if it already exists and re-trying if it fails.

$ cat >atomic-link-replace.c <<'EOF'
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <libgen.h> /* for dirname */
/*  If you use `ln -sf` to replace a symbolic link, it unlinks then creates
    the symlink. This can be avoided by creating the symlink at a temporary
    location first, then renaming it over the top of the old one.  */

int mkltemp(char const *dir, char const *prefix, char const *target, char **out){
    while (1){
        char *path = tempnam(dir, prefix);
        if (path == NULL){
            return errno;
        }
        if (symlink(target, path) == -1){
            int err = errno;
            free(path);
            if (err == EEXIST){
                continue;
            } else {
                return err;
            }
        }
        *out = path;
        return 0;
    }
}

int main(int argc, char **argv){
    if (argc != 3) {
        printf("Usage: %s LINK_TARGET LINK_NAME\n",
               argv[0]);
        return 1;
    }
    {
        char *link_target = argv[1];
        char *link_name = argv[2];
        char *link_dir = dirname(link_name);
        char *tmp_link;
        if (mkltemp(link_dir, "tmpl.", link_target, &tmp_link)){
            perror("Creating temporary symlink\n");
            return 2;
        }
        if (rename(tmp_link, link_name) == -1){
            perror("Renaming temporary symlink into place\n");
            return 3;
        }
        return 0;
    }
}
EOF
$ make CFLAGS=-D_SVID_SOURCE atomic-link-replace
$ ln -sf "old link destination" link
$ readlink link
old link destination
$ ./atomic-link-replace "new link destination" link
$ readlink link
new link destination

Relevant system calls

The mkltemp function we defined in atomic-link-replace.c shows how functions like mkstemp(3) are implemented. The key feature is that the system call for creating the temporary file has to fail and set errno(3) to EEXIST if the target already exists.

The open(2) system call can be made to act this way by setting its flags to O_CREAT|O_EXCL.

Without this, it is not possible to securely create temporary files, without creating the directory to put them in first.

New calls designed to help

A relatively recent addition to Linux, is the O_TMPFILE flag. Kernel support was added in 3.11.

This changes open to take a directory path, rather than a file name. It will return a file descriptor without creating the directory entry.

This means that it will be removed when the process exits, like tmpfile(3). However, unlike tmpfile(3), it doesn't have to rely on atexit(3) to be processed, which can fail to happen if the process is terminated abnormally, such as by signal, or the machine losing power.

Use-case 1 can be handled by using O_TMPFILE|O_RDWR|O_EXCL for the flags to open(2).

O_EXCL prevents the file descriptor being linked into file-system later.

Linking the file in later would be wanted behaviour to satisfy use case 3. The flags for this are O_TMPFILE|O_WRONLY.

The file descriptor can be linked in later using the linkat(2) system call. linkat(tmp_fd, "", AT_FDCWD, target_path, AT_EMPTY_PATH).

This is perfect for use case 3 as intermediate results aren't left around in the target directory, to be cleaned up later.

It does require special handling. You have to use functions like fchmod(2) instead of chmod(2), or use /proc/self/fd/%d.

Posted Wed Apr 30 11:00:07 2014 Tags: