Linux systems are made up from lots of different components, all being worked on by different teams with different goals.

Linux distributions exist so that end users don't have to experience all the pain involved in integrating all these different parts.

Different Linux distributions may have different goals or ideals, and most components can be built in a variety of different ways, such that a Linux distribution with the same versions of every component can produce a system that behaves differently.

So the first thing a Linux distribution does is decides how everything is put together.

The second thing a Linux distribution does is to provide a support channel. The level of support given by the distribution varies, but they're generally who you should go to before the authors of the software itself, since the upstream software authors aren't necessarily working on the same code as that which your distribution built.

This is because of the third responsibility of distributions - to provide appropriate levels of compatibility while keeping the software working and secure.

Upstream projects are generally interested in adding new features, potentially at the cost of changing how users interact with it. For some distributions this isn't appropriate, so they "back-port" bug fixes on top of the older version, providing longer term stability.

The last responsibility I'm going to consider is how they make their software selection and configuration available.

The larger distributions tend to make binary packages available, where they have compiled your software for you. This is not universal, there are source-based distributions like Gentoo, where the end-user is expected to compile their software on their own machine, but tools are provided by the distribution to make this easy.

Posted Wed Feb 4 12:00:09 2015 Tags:
Richard Maw Software Updates

Software bugs, security bugs in particular are both inevitable and important to fix. Fortunately your distribution has a way of making the fixed version available.

There's 3 steps involved:

Getting updated software onto your machine

This is traditionally accomplished by downloading packages over the internet, since most systems already require an internet connection, so it's usually the most convenient way to get security updates.

To prevent downloaded updates becoming a security risk, packages need to be signed to prove that they have legitimately come from your distribution.

Alternatively, packages can be updated by putting the new versions on some form of removable media. This reduces the need to sign the updates, as you can assert some level of trust about where the updates came from, but not entirely, as USB devices can be compromised.

Updating the versions of software in your filesystem

Packages serve as both a lump of data containing the software to install, and metadata describing how it should be installed.

This is normally accomplished by replacing files on the filesystem in a specified order. The naive approach of opening the file to be replaced and rewriting the contents from the version in the package is problematic, as it results in a file that is in a non-viable state if you can't write the whole file out in one write.

For example, suppose you are replacing the C library. Most programs rely on it, so if the C library is in a partially written state, attempts to start new processes will fail. This is even more of a problem if the update is interrupted and the machine crashes, as after a reboot it crashes immediately, since the init process, which is responsible for starting every other process, cannot be run.

Fortunately there is a better approach, we can atomically update a file by:

  1. Write to a temporary file on the same filesystem as the file you want to replace.
  2. Call fsync(2) to ensure the file is written to disk.
  3. Call rename(2) to replace the old version of the file with the new version.

This means that processes either see the old version of the file or the new version, since the rename atomically replaces the old version with the new version.

Updates seldom update just a single file though, so this operation needs to be generalised to multiple files, so it becomes:

  1. Write out all the new files to temporary file paths.
  2. Rename all temporary files into place
  3. Remove all files that only exist in the old version.

This is atomic at the per-file level, but files are inter-dependent, so there is still a window between the file renames that would, if the machine were restarted at an unfortunate time, could result in a broken system. I recently spoke recently at FOSDEM about the difficulties involved in Live Atomic Updates.

Updating running processes

Making new versions available is not the end though, as all your currently running processes are still using the old code as your processes still have the old versions of the code mapped in, even though the files are no longer accessible by the file system.

It is not generally possible to have a process re-load their code during execution, nor is it generaly necessary, as re-starting the process is usually sufficient.

For the times when it is necessary to keep a process responding to requests, there's techniques to gracefully re-exec without losing state.

Posted Wed Feb 11 11:00:10 2015
Richard Maw Uses of SSH

OpenSSH is a handy tool for logging into other machines to run a remote shell, but it's handy for other things too, giving the same authentication mechanism and encryption.

Passwordless logins

Instead of requiring the entry of a password every time you want to log in, you can use a local password manager, or key based authentication.

I prefer key based authentication. Keys can be generated by running ssh-keygen, then future ssh connections can be made connection-less by running ssh copy-id USER@HOST.

File transfer

The scp command can be used to securely copy files between two machines, or even the same machine if it's a shared computing resource, and sensitive data needs to be transferred between users.

For a less ad-hoc file server, OpenSSH includes an sftp-server, though you generally don't invoke it directly, but via sshd_config.

It's very flexible, you can convert the SSH service into an SFTP server by adding the following to /etc/ssh/sshd_config:

ForceCommand internal-sftp

You can then view files in your home directory with the sftp client, or mount it with sshfs.

Git server

Git supports the ssh protocol. If you have a machine you can ssh to, you can run git init --bare reponame.git to create a repository, then clone it with git clone ssh://USER@HOST/path/to/reponame.git.

However for a shared git server this is cumbersome, as it requires every git user to have an associated login account.

Instead, git servers like gitolite and gitano use one "git" user, and handle authentication by assigning ssh keys to users.

Port forwarding

The -R option to ssh can be used to layer encryption and authentication on top of an existing protocol that supports neither.

Supposing HOST has a service that isn't secure, it can instead bind to a local only port, using the host address 127.0.0.1, and a port such as 1234.

This port could be made available to your local machine on port 4321 by running ssh -R 127.0.0.1:1234:127.0.0.1:4321 USER@HOST.

This service can then be connected to by connecting to the address 127.0.0.1:4312 locally.

Advanced

sslh

Corporate firewalls often block all ports not related to web browsing, which limits it to plain HTTP on port 80, and HTTPS on 443.

One way around this is to use sslh, which lets you run both HTTPS and SSH on the same port. To make use of such services, add -p 443 to your ssh command-line.

If you regularly make use of such connections, it may be worthwhile to add something like the following to your ~/.ssh/config file.

Host HOST
    Port 443

Using different keys for different machines

I mentioned earlier that it is possible to do passwordless logins by creating ssh keys.

By default this results in using your one key for authentication to every host. You can generate extra keys by running ssh-keygen -f KEYFILE, and use them instead of the default key by running ssh with ssh -i KEYFILE.

You can specify in your ssh config file to use a different key per host with something like:

Host HOST
    IdentityFile KEYFILE

You might want to do this to avoid the association of a given key to a person, by using different keys per service, and potentially to mitigate the damage of future ssh key reverse engineering attacks, as only the service for the reverse-engineered key is compromised.

Making local resources available remotely

I'm often annoyed that my local configuration is not available on remote machines, so I wrote a script called homely-ssh, which makes the home directory of my local machine available on the remote machine.

I would not recommend its use on shared machines, as it allows other users to access your local machine.

Posted Wed Feb 18 11:00:12 2015 Tags:
Richard Maw Everything is a file

You may have heard the phrase "Everything is a file" in relation to how things work in Linux or Unix.

Normal files

What's meant by this, is that most things in Linux share something in common with files.

For example, regular files are opened with the open(2) system call, but so are directories with the O_DIRECTORY flag, which is how the standard library's opendir(3) works.

Symbolic links aren't generally openable, but every file path is openable with the O_PATH option, which along with directory file descriptors, can be passed to system calls like fchdir(2) or the *at(2) family of system calls.

Devices

Along with all the normal files, there's also special device files in /dev.

There's block devices, like /dev/sda, which represent your physical devices. These let you store persistent data, usually by mounting them as a file system with the mount(2) system call.

Because block devices are files, they can be copied like any normal file to make backups.

There's also a variety of character devices, you are unlikely to need to know all of them, but there's a hand-full of useful ones that everyone should be aware of, like /dev/null, which allows you to discard all writes to a file, everywhere a file is needed, but you don't need the output that would be written there.

There's also /dev/full, which can be used to test how a program handles low space conditions, by always saying that there's no space to write to the file.

/dev/zero produces an inexhaustible supply of zero-bytes, which can be convenient for obliterating the contents of another file.

/dev/random and /dev/urandom are the traditional interfaces to the random number generator, though getrandom(2) is a recent addition, which doesn't require a file to be opened, so it's available when /dev is not reachable, and when you are at your file descriptor limit.

There's also many files in /dev that only work with open(2), close(2) and the device-specific ioctl(2) system call, which is not massively like regular files, but there is still value in sharing the same ownership semantics.

Sockets

There is also the mkfifo(3) system call, which creates a "named pipe", which is an alternative way of creating pipes to the pipe(2) system call, which is often easier to use to have two processes have either end of a pipe.

Speaking of which, the read end of a pipe can be read like any regular file, and the write end can be written like any regular file. This is how pipelines work.

unix(7) sockets also appear on the file system, and unlike pipes, the resultant file descriptor can be both read from and written to.

There's also a variety of other types of sockets that can't be opened from the file system, typically created with the socket(2) or socketpair(2) system calls.

Other files

There's plenty of other file-like objects that are made availably by more esoteric system calls.

There's the memfd_create(2) system call for creating anonymous memory maps, which can later be sealed and passed to another process, as a way of sharing data structures between processes.

Sealing the file descriptor is required so that the data can safely be passed around, since if the receiver can't trust that it won't be modified by the sender, then it can't safely use it, since you could change it while it is being used.

Most system calls that handle resources will reference them with a file descriptor, so an exhaustive list would be both boring and very long, but if you're interested in learning about others, just browse man7.org.

Posted Wed Feb 25 12:00:08 2015 Tags: