Whether I am writing my own program, or chosing between existing solutions, one aspect of the decision making process which always weighs heavily on my mind is that of the input and output data formats.

I have been spending a lot of my work days recently working on converting data from a proprietary tool's export format into another tool's input format. This has involved a lot of XML diving, a lot more swearing, and a non-trivial amount of pain. This drove home to me once more that the format of input and output of data is such a critical part of software tooling that it must weigh as heavily as, or perhaps even more heavily than, the software's functionality.

As Tanenbaum tells us, the great thing about standards is that there's so many of them to choose from. XKCD tells us, how that comes about. Data formats are many and varied, and suffer from specifications as vague as "plain text" to things as complex as the structure of data stored in custom database formats.

If you find yourself writing software which requires a brand new data format then, while I might caution you to examine carefully if it really does need a new format, you should ensure that you document the format carefully and precisely. Ideally give your format specification to a third party and get them to implement a reader and writer for your format, so that they can check that you've not missed anything. Tests and normative implementations can help prop up such an endeavour admirably.

Be sceptical of data formats which have "implementation specific" areas, or "vendor specific extension" space because this is where everyone will put the most important and useful data. Do not put such beasts into your format design. If you worry that you've made your design too limiting, deal with that after you have implemented your use-cases for the data format. Don't be afraid to version the format and extend later; but always ensure that a given version of the data format is well understood; and document what it means to be presented with data in a format version you do not normally process.

Phew.


Given all that, I exhort you to consider carefully how your projects manage their input and output data, and for these things to be uppermost when you are choosing between different solutions to a problem at hand. Your homework is, as you may have grown to anticipate at this time, to look at your existing projects and check that their input and output data formats are well documented if appropriate.

Posted Wed Jul 5 12:00:06 2017 Tags:

System calls

You can't talk about time without clocks. A standard definition of clocks is "An instrument to measure time".

Strictly speaking the analogy isn't perfect, since these clocks aren't just different ways of measuring time, they measure some different notions of time, but this is a somewhat philosophical point.

The interface for this in Linux are system calls with a clkid_t parameter, though not all system calls are valid for every clock.

This is not an exhaustive list of system calls, since there are also older system calls for compatibility with POSIX or other UNIX standards, that offer a subset of the functionality of the above ones.

System calls beginning clock_ are about getting the time or waiting, beginning timer_ are for setting periodic or one-shot timers, timerfd_ are for timers that can be put in event loops.

For the sake of not introducing too many concepts at once, we're going to start with the simplest clock and work our way up.

CLOCK_MONOTONIC_RAW

The first clock we care about is CLOCK_MONOTONIC_RAW.

This is the simplest clock.

It it initialised to an arbitrary value on boot and counts up at a rate of one second per second to the best of its ability.

This clock is of limited use on its own since the only clock-related system calls that work with it are clock_gettime(2) and clock_getres(2). It can be used to determine the order of events, if the time of the event was recorded and the relative time difference between when they happened, since we know that the clock increments one second per second.

Example

In this program below, we time how long it takes to read 4096 bytes from /dev/zero, and print the result.

#include <stdio.h> /* fopen, fread, printf */
#include <time.h> /* clock_gettime, CLOCK_MONOTONIC_RAW, struct timespec */

int main(void) {
    FILE *devzero;
    int ret;
    struct timespec start, end;
    char buf[4096];

    devzero = fopen("/dev/zero", "r");
    if (!devzero) {
        return 1;
    }

    /* get start time */
    ret = clock_gettime(CLOCK_MONOTONIC_RAW, &start);
    if (ret < 0) {
        return 2;
    }

    if (fread(buf, sizeof *buf, sizeof buf, devzero) != sizeof buf) {
        return 3;
    }

    /* get end time */
    ret = clock_gettime(CLOCK_MONOTONIC_RAW, &end);
    if (ret < 0) {
        return 4;
    }

    end.tv_nsec -= start.tv_nsec;
    if (end.tv_nsec < 0) {
        end.tv_sec--;
        end.tv_nsec += 1000000000l;
    }
    end.tv_sec -= start.tv_sec;

    printf("Reading %zu bytes took %.0f.%09ld seconds\n",
           sizeof buf, (double)end.tv_sec, (long)end.tv_nsec);

    return 0;
}

You're possibly curious about the naming of the clock called MONOTONIC_RAW.

In the next article we will talk about CLOCK_MONOTONIC, which may help you understand why it's named the way it is.

I suggested the uses of this clock are for the sequencing of events and for calculating the relative period between them. If you can think of another use please send us a comment.

If you like to get hands-on, you may want to try reimplementing the above program in your preferred programming language, or extending it to time arbitrary events.

Posted Wed Jul 12 12:00:13 2017 Tags:

Normally our Unix systems organise the file system in a structure called the Filesystem Hierarchy Standard (FHS). Installing into an FHS has limitations, what would happen if we want to install, for example, two different versions of ruby at the same time? Typically this isn't possible without explicitly specifying a separate installation directory, if we just install to the usual place e.g. /usr/bin then we will just overwrite the previous ruby. So perhaps we would install one ruby into /usr/bin and another into /usr/local/bin, this is fine, but what about dependent libs? Assuming the two different versions of ruby do require different dependencies then we have potentially the same problem that the dependencies for the 1st ruby might overwrite the dependencies for the 2nd ruby.

Nix gets around this to some extent by not using FHS, instead nix installs all files into the nix store, which is usually located at /nix/store. All programs in a nix store are identified by their store path, which is uniquely generated for each distinct nix package. As a result of this, different versions of the same ruby no longer conflict because they are each assigned their own locations within the nix store.

To enable use of programs within the store, nix maintains an environment which is basically a mapping of FHS path -> nix store path, where the -> is a symlink. So for example, let's first install ruby 2.0 into our environment

nix@salo:~$ nix-env -f nixpkgs -iA pkgs.ruby_2_0
installing ‘ruby-2.0.0-p648’
these paths will be fetched (3.43 MiB download, 19.35 MiB unpacked):
  /nix/store/bxm4s71qdyh071ap5ywxc63aja62cbyc-gdbm-1.13
  /nix/store/d2ccapssrq683rj0fr7d7nb3ichxvlsy-ruby-2.0.0-p648
  /nix/store/h85k47l9zpwwxdsn9kkjmqw8pnfnrwmm-libffi-3.2.1
  /nix/store/zj8cjx71sqvv46sxfggjpdzqz6nss047-libyaml-0.1.7
fetching path ‘/nix/store/bxm4s71qdyh071ap5ywxc63aja62cbyc-gdbm-1.13’...
....
building path(s) ‘/nix/store/j649f78ha04mi1vykz601b00ml3qlr9q-user-environment’
created 419 symlinks in user environment

we can see the symlink that was just created to our ruby2.0 in the store,

nix@salo:~$ ls -l $(which irb)
lrwxrwxrwx 1 nix nix 67 Jan  1  1970 /home/nix/.nix-profile/bin/irb -> /nix/store/d2ccapssrq683rj0fr7d7nb3ichxvlsy-ruby-2.0.0-p648/bin/irb

nix@salo:~$ irb
irb(main):001:0> puts RUBY_VERSION
2.0.0

as you can see we're only able to execute the interactive ruby prompt irb because it's symlinked into our environment which is, of course, on the $PATH,

nix@salo:~$ echo $PATH
/home/nix/.nix-profile/bin:/home/nix/.nix-profile/sbin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games

to prove the point about multiple versions on the same system let's swap ruby 2.0 for ruby 2.4

nix@salo:~$ nix-env -f nixpkgs -iA pkgs.ruby_2_4
replacing old ‘ruby-2.0.0-p648’
installing ‘ruby-2.4.1’
these paths will be fetched (3.13 MiB download, 15.32 MiB unpacked):
  /nix/store/48xrfkanmx5sshqj1364k2dw25xr4znj-ruby-2.4.1
fetching path ‘/nix/store/48xrfkanmx5sshqj1364k2dw25xr4znj-ruby-2.4.1’...
...
*** Downloading ‘https://cache.nixos.org/nar/00hh9w9nvlbinya1i9j0v7v89pw3zzlrfqps72441k7p2n8zq7d3.nar.    xz’ to ‘/nix/store/48xrfkanmx5sshqj1364k2dw25xr4znj-ruby-2.4.1’...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3205k  100 3205k    0     0   114k      0  0:00:27  0:00:27 --:--:--  125k

building path(s) ‘/nix/store/7b2mmk2ffmy1c2bxq7r6y9cn6r0nwn8s-user-environment’
created 415 symlinks in user environment

nix@salo:~$ ls -l $(which irb)
lrwxrwxrwx 1 nix nix 62 Jan  1  1970 /home/nix/.nix-profile/bin/irb -> /nix/store/48xrfkanmx5sshqj1364k2dw25xr4znj-ruby-2.4.1/bin/irb

nix@salo:~$ irb
irb(main):001:0> puts RUBY_VERSION
2.4.1

You may be wondering whether this is really an improvement, since although we have multiple versions of the same package installed on our system, we can only have one ruby in the environment at any one time. To deal with this nix provides the nix-shell utility which constructs an environment on demand and runs a new shell based on that environment.

nix@salo:~/nixpkgs$ nix-shell -p ruby_2_0            
these paths will be fetched (4.44 MiB download, 22.57 MiB unpacked):
  /nix/store/2l8irkrhvdqmd1h96pcnwv0832p9r901-libffi-3.2.1
  /nix/store/945sd3dbynzpkqdd71cqqpsl8gwi9zsq-ruby-2.0.0-p647
  /nix/store/m74m7c4qbzml7ipfxzlpxddcn9ah8jrs-gdbm-1.12
  /nix/store/zbjyc3ylb9bj3057rk5payv3sr0gnmkc-openssl-1.0.2l
  /nix/store/zsgmhsc8pjx9cisbjdk06qqjm8h89lmp-libyaml-0.1.7
fetching path ‘/nix/store/m74m7c4qbzml7ipfxzlpxddcn9ah8jrs-gdbm-1.12’...
...
[nix-shell:~/nixpkgs]$ which irb
/nix/store/945sd3dbynzpkqdd71cqqpsl8gwi9zsq-ruby-2.0.0-p647/bin/irb

[nix-shell:~/nixpkgs]$ irb
irb(main):001:0> puts RUBY_VERSION
2.0.0
=> nil
irb(main):002:0>

nix@salo:~/nixpkgs$ nix-shell -p ruby_2_4
these paths will be fetched (3.13 MiB download, 15.30 MiB unpacked):
  /nix/store/wly748apb5r37byvvgq85hshgzcahv0y-ruby-2.4.0
fetching path ‘/nix/store/wly748apb5r37byvvgq85hshgzcahv0y-ruby-2.4.0’...
...
[nix-shell:~/nixpkgs]$ which irb
/nix/store/wly748apb5r37byvvgq85hshgzcahv0y-ruby-2.4.0/bin/irb

[nix-shell:~/nixpkgs]$ irb
irb(main):001:0> puts RUBY_VERSION
2.4.0
=> nil
irb(main):002:0>

We haven't even started to scratch the surface in this intro, there's lots of really exciting stuff I've not even mentioned, like how you can always rollback to the environment at an earlier state: every mutation to the environment is recorded, so every time you install or uninstall a nixpkg a new "generation" of the environment is created, and it's always possible to immediately rollback to some earlier generation. NixOS itself takes all these super exciting ideas and applies them to an entire operating system, where each user has their own environment, so ruby for one user might mean ruby2.0 and ruby for another might mean ruby2.4. Hopefully it's clear now how these different versions of the same package can live in harmony under NixOS.

I hope I've managed to convey some of nix's coolness in this short space, if I have then you should definitely check lethalman's "nix-pills"1 series for a really deep explanation of how nix works internally and how to create nixpkgs from scratch. There's also ofcourse the NixOS website2 and #nixos on irc.freenode.net which is probably one of the friendliest communities out there.

Posted Wed Jul 19 12:00:07 2017
Richard Maw Time - Adjustment

CLOCK_MONOTONIC_RAW reflects the underlying hardware for measuring time.

In the short term this is mostly correct, but hardware isn't perfect, so it can, over long periods, drift and no longer be synchronised with what the rest of the world considers the time to be.

If you know what the time is meant to be elsewhere in the world, then you can adjust your clock to correct it.

Typically your computer will do this with the Network Time Protocol, or NTP,, by asking trusted computers on the internet what the time is.

Actually correcting the time works by using the adjtimex(2) or clock_adjtime(2) system calls.

This can't be done with CLOCK_MONOTONIC_RAW, but can be done with CLOCK_MONOTONIC.

In addition to being correctable, CLOCK_MONOTONIC can be used with clock_nanosleep(2).

This will allow you to sleep for at least the period of time specified, though could be interrupted when a signal is delivered, (which could happen with timer_create(2)).

Similar clocks

If it's more important to get the time quickly, than to get a more precise time, such as if you're profiling real-time software and want to not slow it down, then you can use CLOCK_MONOTONIC_COARSE.

CLOCK_MONOTONIC can be used to time events, but works by counting seconds while the computer is running. This could be a problem if your computer suspends, since then it would stop counting.

The solution to this is the CLOCK_BOOTTIME clock, which will include seconds spent suspended, so could be used to time external events.


So far everything discussed has been somewhat abstract, divorced from what we commonly understand to be time.

This will be rectified in the next article in the time series, where we will be covering "real time".

Posted Wed Jul 26 12:00:07 2017 Tags: