All your personal files go somewhere into your home directory. If you want to find them reasonably efficiently, when you need them, you need to put in some effort into keeping your files organised well. This sounds like a bother, but most of the work is slight, after an initial planning phase. During the planning phase, you decide where each type of thing should be, and after that, you just need to put them there.
It's not very important how you organise things, as long as you know where to put everything, and where to find them again. Being systematic really pays off here.
Here's a few things to consider:
You may want to separate important data from scratch data, to avoid backing up large files that have little value. If you put the scratch data into
~/scratch
or a similar place, you can just avoid backing that up.It may be worthwhile to separate active projects from inactive ones, or from archived files in general. You might have a
~/Archive
directory into which you put files and projects that are no longer actively needed, but that you want to keep around.You may also want to keep work and personal files separate, if you use the same computer for both work and personal stuff. You might even want to have separate logins for the two, to maintain a better work/life balance.
Related files probably should go together. For example, the program source code, text documents, video files, and audio recordings that are part of a game development project, probably should all go under
~/my-best-game-ever
, rather than split up between~/Videos
,~/Documents
, and so on.Your source code checkout for a project should probably not be the top project directory. For example,
~/my-best-game-ever
should not be a git working directory. Instead, that should go into~/my-best-game-ever/my-best-game-ever
(or~/my-best-game-ever/src
, or other name you prefer). Invariably, you end up producing things that are not under source control, such as release tarballs, photographs from release parties, or other such files. Then you put them under~/my-best-game-ever/Archive
or~/my-best-game-ever/Releases
. If you build Debian packages, those go into the parent of the source directory, so they'd go under~/my-best-game-ever
as well, rather than into your home directory.
Why not take a look at your home directory, think about what sorts of content you have in there and then consider what organisation of files would work best for you? Perhaps you're already well organised, but maybe there's an opportunity for you to improve your computing life just a little bit.
I previously described that most Linux interfaces are manipulated as some form of file. The benefit of this is that the common APIs allow you to perform the same operation on different types of file with different properties.
Any writable file can be used as the standard output of a program, so
you can redirect output to /dev/null
to discard it, a pipe(7) to feed the
output of one command to another, or a socket(7) to send the output
of a command to another computer.
Pipes and sockets may also be read, and there's other special files,
such as /dev/zero
, /dev/urandom
and /dev/random
, which read data
served by the kernel.
select(2) and friends
Reading and writing is not the only operation to perform on files.
It is possible to wait for any of a group of them to have data available with the select(2), poll(2) or epoll(7) families of system calls.
These system calls are useful, since attempts to read from, or write to certain special files, will suspend process execution until there is data available to read or the data has finished being writen.
This is a useful property for when a process is handling data, since it allows other proceses to do their work, but if a process is supposed to handle connections from multiple different sources, then it will result in one connection starving the others, when the others might be ready with data.
To solve this the select()
family of system calls wait for one of a group of
file descriptors to be ready before resuming process execution.
event file descriptors
Previously we were interested in waiting for file descriptors to be ready for data to be read from or written to, but other events can be waited for too.
Instead of having a dedicated system call for waiting for the event,
it is better to wait for events in a select()
loop, as it allows multiple
events to be waited for together.
The kind of event to wait for depends on the type of file descriptor.
Instead of specifying a timeout in the timeout parameter of the
select()
system calls, you can pass a file descriptor created with the
timerfd_create(2) system call.
Instead of registering a signal handler, a signalfd(2) can be created, which will let you handle the signal in the context of the process calling select on the file descriptor.
Not every signal can be handled by the signalfd: SIGBUS for example. However there is, at the time of writing, work in progress to handle this with a userfaultfd(2), which allows you to mmap(2) replacement data into a process when it is not available.
This has to be run in a separate thread, since you can't select()
on a
userfaultfd while also triggering a page fault.
Finally, there's the eventfd(2), which has two main uses:
- Providing a way for a thread/process to send event notifications to another thread/process's event loop.
- Provide a semaphore-like lock.
Other files that can be waited for
The file descriptor returned by [epoll_create(2)][] can also be polled for events. This is handy, as it allows you to plug one epoll-based event loop into another, and abstract away that an event-driven library handles multiple file descriptors internally and only hand one file descriptor to the calling program so that it can plug that into its event loop.
Most of the physical devices attached to a system are visible in
/dev
, network devices are unfortunately missing from here. Instead,
notifications and control of network devices is possible with special
netlink(7) sockets.
Storage of data under Linux (and most any other operating system) typically
happens on what are known as block devices. Underlying the storage is very
likely a common hard disc or ssd. However very little software
actually wants to store data directly onto block devices. Instead we have a
special class of software built into the kernel called file systems (we've
spoken about these before) which layer an organisational
structure on top of the block device and give us the interface we typically
understand to be how a computer organises its storage, namely the sorts of file
paths (e.g. /etc/passwd
or /home/myuser/Documents
) with which we typically
interact.
If only life were simple, this article could end right here, but of course it is not. Hard discs are typically much larger than the computer's volatile memory and as such there are a few standards for splitting up the disc into smaller partitions. This allows the user of the disc to choose how much space to allocate to different filesystems which can then be joined together at run-time.
If data was only ever strictly smaller than the biggest disc one could buy then our journey would stop here. In addition, if only every disc were perfectly reliable then we could stop without worrying. This is, as you'd expect, not the case.
There exists a range of storage technologies known as [RAID][]. RAID comes in many flavours and joins together multiple block devices (discs or partitions) and covers a multitude of properties such as creating a block device larger than any of its components, offering resilience in the face of failing devices or offering different performance characteristics by providing multiple paths to your data. Under Linux, the common RAID solution is MD.
So now we have large block devices, ways of splitting them up, ways of joining them together, but ultimately these are static in the sense that once partitions or RAID shapes are settled, they are rarely changed. But data storage requirements, even for a fixed-purpose system, change over time and so there's one more technology I'd like to mention.
Discs, partitions etc are sometimes referred to as 'volumes' and Linux has a mechanism known as the Logical Volume Manager which also allows the joining together of multiple block devices into what LVM calls 'volume groups' but critically it then allows the splitting of volume groups into 'logical volumes' which can be created and destroyed, resized, moved, and altered at runtime.
Of course, file systems can be built on top of raw discs, partitions, RAID devices, or logical volumes. Different file systems have different properties and so work better on different kinds of storage and can be tweaked in different ways for different storage, so even at this point we've not explored it all.
Finally, there's a way to close the circle between block storage and files on your filesystem by virtue of the loop device which allows you to take a file on a filesystem and treat it as a block device and therefore make file systems on it, or partition it, or use it as part of an LVM volume group or a RAID system. If you want to play with the loop device then you might find the kpartx program useful (lets you play with partitions on loop devices more easily).
So, having looked at everything from the real discs in your computer all the way to being able to use files on your filesystem as more block devices; your homework is to take a look at how the block storage is arranged on your computers -- sometimes Linux installers will use LVM without you knowing and sometimes they even look at setting up RAIDs for you. Take a look at how your discs are partitioned and then consumed by your computer, and gain a little more confidence in your ability to know what's going on.
We previously spoke about permissions from the perspective of files, where the combination of file owner and permission mode bits allow you to refuse access to certain files by certain users.
When you attempt to access a file that you aren't permitted to, you get the "Operation not permitted" error message. This is not the only case when you can get that error message though, as some system calls are restricted.
Restricted to the root user
The traditional *nix permission model is that the root user (who has user-id of 0) can do anything they want, but other users are restricted to a reduced range of system calls.
For example, on a shared machine, you don't want every user to be able to call reboot(2), only the root user is allowed to do this, and the system administrator is assumed to be able to run commands as the root user.
Elevating privileges
The naive solution to users needing to gain elevated privileges to perform system administration is to share the password to the root user with responsible users, so they can log in as the root user when necessary.
This is not a good idea, especially if the machine is reachable by the wider internet, as it only requires someone to guess the password for the whole machine to be compromised.
The generally accepted solution is to use a tool like sudo, so that responsible users are allowed to run commands as the root user by entering their own passwords.
The sudo command works by having the setuid
bit set on the
executable, which means that the command is run as the root user. sudo
therefore needs to perform its own permission checks to determine whether
this is acceptable.
Restricted by capabilities
The traditional model is very all-or-nothing. This is fine for letting the system administrator do the work he needs to, but you might want to allow a user to start web services on the standard ports, while not allowing them to reboot the machine.
To make this work, the root user is granted a set of capabilities(7). Root usually has every capability to start with, so it remains compatible with the traditional access model, but processes spawned by a user can voluntarily drop capabilities they don't need.
To bind web services to low ports, processes need the
CAP_NET_BIND_SERVICE
capability, so it is possible to elevate privileges
for a user by dropping every capability that isn't needed, so the user
is able to bind web services, but is missing CAP_SYS_BOOT
, so they
can't reboot the machine.
Other approaches to permissions
Linux security modules
On top of the more traditional permissions model, there's mandatory access control schemes, which add extra permissions restrictions on top.
SELinux is often considered to be complicated beyond the understanding of mere mortals, which often means that even though it's enabled by default on Fedora, one of the first recommended debugging steps is to disable SELinux.
SMACK on the other hand, is designed to be simple, though is suffers from an unfortunate name.
Sandboxing
Instead of running general programs, you can have programs communicate in a restricted language to another entity that acts as a proxy, which can enforce its own logic for which operations are permitted.
Some examples of this are the JVM sandbox, which allows locking down Java programs, supple, which is roughly the same idea, but for Lua progras, and the Google Chrome sandbox, which allows JavaScript on web-pages to be safely executed.