Why use temporary files?
A temporary file is a file that is created without wanting it to stay around forever.
Most uses of temporary files fall into 3 categories:
You have a large amount of data, that you will need later in the program's lifetime, but not now, and it's sufficiently large that you don't want to keep it in-memory.
Writing this out to a temporary file allows you to re-read the temporary file later to get the data back.
This is needed less often these days, since common workloads will easily fit in 64-bits worth of memory, and operating systems are clever about doing this for you automatically, by writing data that isn't being used to a swap partition if memory is getting scarce.
Using an API that takes a file.
You have data that you want a library function, or another program, to process. However, your data is not in a file. So, to deal with this, you can create a file, write your data to it, and pass either a file descriptor or a file path to the library function or program.
It is very common for this to be the use of temp files in shell.
Saving partial results to a file, and renaming that file to the name of the file you want the final results to be saved.
This allows you to make an atomicity guarantee that the final result is always complete.
Rather than my usual approach of demonstrating everything with shell scripts, this will be demonstrated with C, since higher level languages' abstractions can hide the important details this article is trying to teach.
Why shouldn't I just roll my own?
Convenience
If you want to make full use of temporary files, you will eventually need all the features provided by your platform's temporary file API anyway, and it tends to hide some of the tricky details that you would otherwise have to learn about and potentially implement yourself.
Security
Symlink attacks allow a local attacker to make you write files to a place you didn't intend. If your temporary file names can be guessed and can't handle the file already existing, you are vulnerable to this attack.
On its own it allows a denial of service attack by making you use up your disk quota, or trash a file you didn't intend to, but if a second vulnerability can be found to allow the attacker to choose what data is written there, then they can take over your user account.
If the vulnerable program was run as root, the attacker can control the whole system.
Temporary file locations
/tmp
is the traditional and default location for temporary files. Your
operating system will take a couple of steps to avoid these files
piling up.
Remove the contents of /tmp on start-up.
This has the disadvantage of slowing down boot, and long-running systems can run out of disk space from accumulated temporary files.
Mount a temporary file system at /tmp on start-up.
Temporary file systems keep their files in memory (or write their contents to the swap partition if memory runs out), rather than writing their contents to a disk, so their contents are more likely to be in-memory than on-disk, which is good for small files.
This has both the advantage and disadvantage of accounting for storage separately from you main disk, since it's an advantage that you aren't using your main storage for temporary files, but a disadvantage that you're more likely to run out of space with large temporary files.
There's a couple of ways to deal with this last problem.
/var/tmp
is an oxymoron, since /var
is for persistent state, and
tmp
is for temporary files, but it's conventionally used for large
temporary files.
Making your program create temporary files there depends on the API you are using, the useful ones allow you to set a directory as one of the parameters, but if that doesn't work, setting the TMPDIR environment variable usually works.
C library API
file API
tmpfile(3) returns a FILE*
of a file that will be removed when the
FILE*
is fclose(3)d or the program exits. This is very useful for
use-case 1.
$ cat >tmpfile-example.c <<'EOF'
#include <stdio.h>
#include <unistd.h>
int main(int argc, char **argv){
FILE *fh = tmpfile();
for (int i = 1; i < argc; i++){
fprintf(fh, "Arg %d: %s\n", i, argv[i]);
}
rewind(fh);
{
char *buf = NULL;
size_t memlen = 0;
ssize_t bytes_read;
while ((bytes_read = getline(&buf, &memlen, fh)) != -1){
fwrite(buf, sizeof(buf[0]), bytes_read, stdout);
}
}
}
EOF
$ make CFLAGS=-std=c99 tmpfile-example tmpfile-example
$ ./tmpfile-example hello world
Arg 1: hello
Arg 2: world
Secure temporary file creation
The tmpfile(3) is perfect for use-case 1, but is not suitable for use-cases 2 or 3, which require the API to also give you a file descriptor or file path.
mkstemp(3) stands for "make secure temporary file"; mkdtemp(3) stands for "make temporary directory".
mkstemp(3) creates a temporary file based on the string template given, modifies it in-place so you can get the file path afterwards, and returns a file descriptor to the newly created and opened file.
$ cat >copy-to-temp-file.c <<'EOF'
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
/* Wrap mkstemp so directory and prefix can be passed
Returns -errno and leaves *filename untouched on failure
assigns *filename to malloced string and returns fd on success
*/
int mkstemp_with_prefix(char const *dir, char const *prefix, char **filename){
int fd = -1;
char *template = NULL;
if (asprintf(&template, "%s/%sXXXXXX", dir, prefix) == -1){
int err = -errno;
perror("Alloc mkstemp template string");
return err;
}
fd = mkstemp(template);
if (fd == -1){
int err = errno;
perror("Make temporary file");
free(template);
return err;
}
*filename = template;
return fd;
}
int main(int argc, char **argv){
char *dir = ".";
char *prefix = "tmp";
char *filename;
int fd;
int ret = 0;
switch(argc){
case 3:
prefix = argv[2];
case 2:
dir = argv[1];
case 1:
break;
default:
fprintf(stderr, "Usage: %s [DIR [PREFIX]]\n", argv[0]);
return 1;
}
fd = mkstemp_with_prefix(dir, prefix, &filename);
if (fd < 0){
return 2;
}
while ((ret = splice(0, NULL, fd, NULL, 4096, 0)) > 0){
/*no op*/;
}
if (ret == -1){
perror("Copy file");
return 3;
}
printf("%s\n", filename);
return 0;
}
EOF
$ make copy-to-temp-file
$ tempfile=$(echo "Hello World" | ./copy-to-temp-file)
$ cat "$tempfile"
Hello World
$ rm "$tempfile"
mkdtemp(3) has a similar API to mkstemp(3), in that it takes a mutable string template and modifies it, but it does not return a file descriptor.
$ cat >split-file.c <<'EOF'
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
/* Wrap mkdtemp so directory and prefix can be passed
Returns -errno and leaves *filename untouched on failure
assigns *filename to malloced string and returns 0 on success
*/
int mkdtemp_with_prefix(char const *dir, char const *prefix, char **filename){
int ret = 0;
char *template = NULL;
if (asprintf(&template, "%s/%sXXXXXX", dir, prefix) == -1){
ret = -errno;
perror("Alloc mkdtemp template string");
return ret;
}
if (!mkdtemp(template)){
ret = -errno;
perror("Make temporary directory");
free(template);
return ret;
}
*filename = template;
return ret;
}
int write_n_lines(FILE *input, int n, FILE *output){
int ret = 0;
size_t n_alloced = 0;
char *buffer = NULL;
for (int i = 0; i < n; i++){
ssize_t n_read;
n_read = getline(&buffer, &n_alloced, input);
if (n_read == -1){
if (!feof(input)){
perror("Read line");
ret = 1;
}
break;
}
if (fputs(buffer, output) == EOF){
perror("Write line");
ret = 2;
break;
}
}
free(buffer);
return ret;
}
int main(int argc, char **argv){
char *dir = ".";
char *prefix = "tmp";
char *tempdir;
unsigned lines_per_file;
switch(argc){
case 4:
prefix = argv[3];
case 3:
dir = argv[2];
case 2:
lines_per_file = atoi(argv[1]);
break;
default:
fprintf(stderr, "Usage: %s LINES_PER_FILE [DIR [PREFIX]]\n",
argv[0]);
return 1;
}
if (lines_per_file <= 0){
fprintf(stderr, "Lines per file must be a positive integer\n");
return 2;
}
if (mkdtemp_with_prefix(dir, prefix, &tempdir)){
return 3;
}
printf("%s\n", tempdir);
for (int i = 0; 1; i++){
char *filename = NULL;
FILE* fileobj;
if (asprintf(&filename, "%s/%03d", tempdir, i) == -1){
free(filename);
perror("Formatting output file name");
}
fileobj = fopen(filename, "wx");
if (fileobj == NULL){
perror("Opening output file");
free(filename);
return 4;
}
if (write_n_lines(stdin, lines_per_file, fileobj)){
fclose(fileobj);
free(filename);
return 5;
}
fclose(fileobj);
free(filename);
if (feof(stdin)){
break;
}
}
return 0;
}
EOF
$ make CFLAGS="-std=c99 -D_GNU_SOURCE" split-file
$ tempdir=$(seq 9 | ./split-file 3)
$ (cd "$tempdir" && ls)
0000 0001 0002 0003
$ find "$tempdir" -delete
It is safe to create files with fixed names in a temporary directory,
since it is created with mode 700
, which means only you are able to
create files in there.
Temporary name generation
WARNING: You probably don't want to use these functions, since they are a security risk if used improperly, and the functions for making temporary files handle this complexity for you.
mktemp(3) and tempnam(3) return a file path that theoretically could be used for a temporary file.
The former generates its name from default settings and the value of the TMPDIR environment variable, while the latter lets you specify a directory and a prefix.
The manual pages for these explicitly say that you shouldn't be using these, and you should instead use mkstemp(3) or mkdtemp(3).
They are safe to use if you handle not creating the directory entry if it already exists and re-trying if it fails.
$ cat >atomic-link-replace.c <<'EOF'
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <libgen.h> /* for dirname */
/* If you use `ln -sf` to replace a symbolic link, it unlinks then creates
the symlink. This can be avoided by creating the symlink at a temporary
location first, then renaming it over the top of the old one. */
int mkltemp(char const *dir, char const *prefix, char const *target, char **out){
while (1){
char *path = tempnam(dir, prefix);
if (path == NULL){
return errno;
}
if (symlink(target, path) == -1){
int err = errno;
free(path);
if (err == EEXIST){
continue;
} else {
return err;
}
}
*out = path;
return 0;
}
}
int main(int argc, char **argv){
if (argc != 3) {
printf("Usage: %s LINK_TARGET LINK_NAME\n",
argv[0]);
return 1;
}
{
char *link_target = argv[1];
char *link_name = argv[2];
char *link_dir = dirname(link_name);
char *tmp_link;
if (mkltemp(link_dir, "tmpl.", link_target, &tmp_link)){
perror("Creating temporary symlink\n");
return 2;
}
if (rename(tmp_link, link_name) == -1){
perror("Renaming temporary symlink into place\n");
return 3;
}
return 0;
}
}
EOF
$ make CFLAGS=-D_SVID_SOURCE atomic-link-replace
$ ln -sf "old link destination" link
$ readlink link
old link destination
$ ./atomic-link-replace "new link destination" link
$ readlink link
new link destination
Relevant system calls
The mkltemp
function we defined in atomic-link-replace.c
shows how
functions like mkstemp(3) are implemented. The key feature is that
the system call for creating the temporary file has to fail and set
errno(3) to EEXIST
if the target already exists.
The open(2) system call can be made to act this way by setting its
flags to O_CREAT|O_EXCL
.
Without this, it is not possible to securely create temporary files, without creating the directory to put them in first.
New calls designed to help
A relatively recent addition to Linux, is the O_TMPFILE
flag. Kernel
support was added in 3.11.
This changes open to take a directory path, rather than a file name. It will return a file descriptor without creating the directory entry.
This means that it will be removed when the process exits, like tmpfile(3). However, unlike tmpfile(3), it doesn't have to rely on atexit(3) to be processed, which can fail to happen if the process is terminated abnormally, such as by signal, or the machine losing power.
Use-case 1 can be handled by using O_TMPFILE|O_RDWR|O_EXCL
for the
flags to open(2).
O_EXCL
prevents the file descriptor being linked into file-system later.
Linking the file in later would be wanted behaviour to satisfy use case
3. The flags for this are O_TMPFILE|O_WRONLY
.
The file descriptor can be linked in later using the linkat(2)
system call. linkat(tmp_fd, "", AT_FDCWD, target_path, AT_EMPTY_PATH)
.
This is perfect for use case 3 as intermediate results aren't left around in the target directory, to be cleaned up later.
It does require special handling. You have to use functions like
fchmod(2) instead of chmod(2), or use /proc/self/fd/%d
.
Also there are some stylistic oddities:
Thanks for the comment. My knowledge of the C standard library had atrophied, since I'd either been reading kernel code or programming in higher level languages, so I'm grateful that you've found some mistakes for me to correct.
While the example programs work as an illustration of how to use the relevant functions, I'd like the examples to be fully correct, so I'll try to have these errors fixed up later this week.
The use of errno was safe, as perror and free don't change errno, but I've made the change, since this is an exception to the norm.
The while loops were a result of following the form of a retry loop, even when one wasn't warranted.
Thanks for the suggestion to use asprintf in more places.