You should be familiar with the mv(1) command by now, which moves a file from one place to another.

If you're writing shell scripts, or if you're using a library which lets you move files then you don't need to worry about how it works, but if you don't have a library available you might be surprised by the amount of effort required.

It's just a rename(2), right?

If you can guarantee that the destination is on the same filesystem and that you don't care if it replaces some other file, then yes.

Not clobbering

mv(1) has the -n or --no-clobber option to prevent accidentally overwriting a file.

The naive way to do this would be to check whether the file exists before calling rename(2), but this is a TOCTTOU bug which can cause this to overwrite if another thread puts a file there.

To do this safely use renameat2(2) with the RENAME_NOREPLACE flag, which will make it fail if the destination already exists.

The destination is on another filesystem

On a modern Linux distribution your files are usually spread across multiple file systems, so your persistent files are on a filesystem mounted from local storage, but your operating system puts temporary files on a different file system so they get removed when your computer shuts down.

Unfortunately, the rename(2) system call does not work if the destination is on a different file system.

Checking ahead of time whether a path is on a different file system is traditionally handled by calling stat(2), and checking whether the st_dev field differs, but this is another TOCTTOU bug waiting to happen and rename(2) sets errno(3) to EXDEV which lets you know it failed for being on another filesystem in the same system call you would have made anyway.

If you care about still being able to move the file when its destination is on a different file system then you need a fallback when this happens.

So we fall back to copying the file and removing the old one?

In principle, yes, though actually implementing this is surprisingly difficult.

Handling the fallback logic itself is not straight-forward, we'll get that out of the way first.

We can fallback to rename(2) if renameat2(2) is not implemented but only if we don't need to handle not clobbering the target.

When that happens we need to fall back to the copy, which can use O_EXCL with O_CREAT to only write to the file if it didn't already exist.

If unlinking the source file fails because the file doesn't exist, then that means that we were able to copy its contents while something else removed it.

Given the file was written to its destination and it no longer exists where it used to it can be argued that the operation as a whole was successful.

/* my-mv.c */
#include <stdbool.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/syscall.h>

#if !HAVE_DECL_RENAMEAT2
static inline int renameat2(int oldfd, const char *oldname, int newfd, const char *newname, unsigned flags) {
    return syscall(__NR_renameat2, oldfd, oldname, newfd, newname, flags);
}
#endif

#ifndef RENAME_NOREPLACE
#define RENAME_NOREPLACE (1<<0)
#endif

int copy_file(const char *source, const char *target, bool no_clobber);

int move_file(const char *source, const char *target, bool no_clobber) {
    int ret;
    ret = renameat2(AT_FDCWD, source, AT_FDCWD, target, no_clobber ? RENAME_NOREPLACE : 0);
    if (ret == 0)
        return ret;
    if (errno == EXDEV)
        goto xdev;
    if (errno != ENOSYS) {
        perror("renaming file");
        return ret;
    }
    /* Have to skip to copy if unimplemented since rename can't detect EEXIST */
    if (no_clobber)
        goto xdev;
rename:
    ret = rename(source, target);
    if (ret == 0)
        return ret;
    if (errno == EXDEV)
        goto xdev;
    perror("renaming file");
    return ret;
xdev:
    ret = copy_file(source, target, no_clobber);
    if (ret < 0)
        return ret;
    if (unlink(source) < 0 && errno != ENOENT) {
        perror("unlinking source file");
        return -1;
    }
    return ret;
}

So we open both files, and loop reading data then writing it?

This will produce a file that when read, will produce the same stream of bytes as the original.

You could use stdio(3) to copy the contents, but that will have to be left as an exercise for the reader, since I don't like its record-based interface, I prefer to deal with file descriptors over FILE* handles, and the buffering makes error handling more… interesting.

So, broadly, the idea is to read into a buffer, then write from the buffer to the target file.

However, EINTR is a problem, many system calls can be interrupted before they do anything, and read(2) and write(2) may return less than you asked for.

Glibc has a handy TEMP_FAILURE_RETRY macro for handling EINTR, but to handle the short reads and writes, you need to always work in a loop.

int naive_contents_copy(int srcfd, int tgtfd) {
    /* 1MB buffer, too small makes it slow,
       shrink this if you feel memory pressure on an embedded device */
    char buf[1 * 1024 * 1024];
    ssize_t total_copied = 0;
    ssize_t ret;
    for (;;) {
        ssize_t n_read;
        ret = TEMP_FAILURE_RETRY(read(srcfd, buf, sizeof(buf)));
        if (ret < 0) {
            perror("Reading from source");
            return ret;
        }
        n_read = ret;

        /* Reached the end of the file */
        if (n_read == 0)
            return n_read;

        while (n_read > 0) {
            ret = TEMP_FAILURE_RETRY(write(tgtfd, buf, n_read));
            if (ret < 0) {
                perror("Writing to target");
                return ret;
            }

            n_read -= ret;
            total_copied += ret;
        }
    }
    return 0;
} 

int copy_file(const char *source, const char *target, bool no_clobber) {
    int srcfd = -1;
    int tgtfd = -1;
    srcfd = open(source, O_RDONLY);
    if (srcfd == -1) {
        perror("Opening source file");
        return srcfd;
    }
    tgtfd = open(target, O_WRONLY|O_CREAT|(no_clobber ? O_EXCL : 0), 0600);
    if (tgtfd == -1) {
        perror("Opening target file");
        return tgtfd;
    }
    return naive_contents_copy(srcfd, tgtfd);
}

Making use of our new function

So now we have a nice move_file function that will fall back to copying it if renaming does not work.

But code is of no use in isolation, we need a program for it to live in, and the simplest way to use it is a command-line program.

#include <getopt.h>
#include <string.h>

int main(int argc, char *argv[]) {
    char *source;
    char *target;
    bool no_clobber = false;

    enum opt {
        OPT_NO_CLOBBER = 'n',
        OPT_CLOBBER = 'N',
    };
    static const struct option opts[] = {
        { .name = "no-clobber", .has_arg = no_argument, .val = OPT_NO_CLOBBER, },
        { .name = "clobber",    .has_arg = no_argument, .val = OPT_CLOBBER, },
        {},
    };

    for (;;) {
        int ret = getopt_long(argc, argv, "nN", opts, NULL);
        if (ret == -1)
            break;
        switch (ret) {
        case '?':
            return 1;
        case OPT_NO_CLOBBER:
        case OPT_CLOBBER:
            no_clobber = (ret == OPT_NO_CLOBBER);
            break;
        }
    }
    if (optind == argc || argc > optind + 2) {
        fprintf(stderr, "1 or 2 positional arguments required\n");
        return 2;
    }
    source = argv[optind];
    if (argc == optind + 2)
        target = argv[optind + 1];
    else
        /* Move into the current directory with the same name */
        target = basename(source);

    if (move_file(source, target, no_clobber) >= 0)
        return 0;
    return 1;   
}
$ if echo 'int main(){(void)renameat2;}' | gcc -include stdio.h -xc - -o/dev/null 2>/dev/null; then
>     HAVE_DECL_RENAMEAT2=1
> else
>     HAVE_DECL_RENAMEAT2=0
> fi
$ make CFLAGS="-D_GNU_SOURCE -DHAVE_DECL_RENAMEAT2=$HAVE_DECL_RENAMEAT2" my-mv
$ ./my-mv
1 or 2 positional arguments required
$ touch test-file
$ ./my-mv test-file clobber-file
$ ls test-file clobber-file
ls: cannot access test-file: No such file or directory
clobber-file
$ ./my-mv -n test-file clobber-file 
rename2: No such file or directory
$ touch test-file
$ ./my-mv --no-clobber test-file clobber-file
rename2: File exists
$ ./my-mv test-file clobber-file
$ ls test-file clobber-file
ls: cannot access test-file: No such file or directory

So we've got a complete fallback for rename(2) now?

Not quite.

For most purposes this is likely to be sufficient, but there's a lot more to a file than the data you can read out of it, far more than I can cover in this article, so there will be follow-up articles to cover copying other aspects of files including:

  1. Sparseness
  2. Speed
  3. Metadata
  4. Atomicity
  5. Other types of file

Thanks for this article it's really useful, it would be interesting to know why you're not fond of the stdio interface, and also potentially worth mentioning that EINTR is really only something that needs to be explicitly handled on Linux, from signal(7):

   On Linux, even in the absence of signal handlers, certain blocking interfaces  can
   fail  with the error EINTR after the process is stopped by one of the stop signals
   and then resumed via SIGCONT.  This behavior is not  sanctioned  by  POSIX.1,  and
   doesn't occur on other systems.

This behaviour seems less than helpful to me, it would be really good to know if there's a good reason why GNU/Linux doesn't just restart the call (as the BSDs do)

Also, I had no idea you could make system calls without having a C wrapper for them, so thanks for that as well!

Comment by Gravious Fri Dec 30 20:47:34 2016

Thanks for this article it's really useful, it would be interesting to know why you're not fond of the stdio interface,

I don't like the buffering behaviour, it defers writes late enough that I lose context about what bit of the write failed, so when I go to flush and close I can't say how much was actually written.

and also potentially worth mentioning that EINTR is really only something that needs to be explicitly handled on Linux, from signal(7):

On Linux, even in the absence of signal handlers, certain blocking interfaces  can
fail  with the error EINTR after the process is stopped by one of the stop signals
and then resumed via SIGCONT.  This behavior is not  sanctioned  by  POSIX.1,  and
doesn't occur on other systems.

This behaviour seems less than helpful to me, it would be really good to know if there's a good reason why GNU/Linux doesn't just restart the call (as the BSDs do)

There's also the fact that some signals can be configured to restart, but not others. Signals in general are a bit of a mess, so my way of dealing with it is to wrap everything in a retry and leave signal handling for shut down and use a better form of IPC for everything else.

Also, I had no idea you could make system calls without having a C wrapper for them, so thanks for that as well!

Yep, it's a pain when the wrappers don't exist because you need to handle the error return calling convention yourself, since part of what the libc wrappers do is set errno. I tend to use negative error number returns in my own code rather than errno which makes it actually closer to what I prefer.

Comment by Richard Maw Tue Jan 3 13:20:58 2017