Storage of data under Linux (and most any other operating system) typically
happens on what are known as block devices. Underlying the storage is very
likely a common hard disc or ssd. However very little software
actually wants to store data directly onto block devices. Instead we have a
special class of software built into the kernel called file systems (we've
spoken about these before) which layer an organisational
structure on top of the block device and give us the interface we typically
understand to be how a computer organises its storage, namely the sorts of file
paths (e.g. /etc/passwd
or /home/myuser/Documents
) with which we typically
interact.
If only life were simple, this article could end right here, but of course it is not. Hard discs are typically much larger than the computer's volatile memory and as such there are a few standards for splitting up the disc into smaller partitions. This allows the user of the disc to choose how much space to allocate to different filesystems which can then be joined together at run-time.
If data was only ever strictly smaller than the biggest disc one could buy then our journey would stop here. In addition, if only every disc were perfectly reliable then we could stop without worrying. This is, as you'd expect, not the case.
There exists a range of storage technologies known as [RAID][]. RAID comes in many flavours and joins together multiple block devices (discs or partitions) and covers a multitude of properties such as creating a block device larger than any of its components, offering resilience in the face of failing devices or offering different performance characteristics by providing multiple paths to your data. Under Linux, the common RAID solution is MD.
So now we have large block devices, ways of splitting them up, ways of joining them together, but ultimately these are static in the sense that once partitions or RAID shapes are settled, they are rarely changed. But data storage requirements, even for a fixed-purpose system, change over time and so there's one more technology I'd like to mention.
Discs, partitions etc are sometimes referred to as 'volumes' and Linux has a mechanism known as the Logical Volume Manager which also allows the joining together of multiple block devices into what LVM calls 'volume groups' but critically it then allows the splitting of volume groups into 'logical volumes' which can be created and destroyed, resized, moved, and altered at runtime.
Of course, file systems can be built on top of raw discs, partitions, RAID devices, or logical volumes. Different file systems have different properties and so work better on different kinds of storage and can be tweaked in different ways for different storage, so even at this point we've not explored it all.
Finally, there's a way to close the circle between block storage and files on your filesystem by virtue of the loop device which allows you to take a file on a filesystem and treat it as a block device and therefore make file systems on it, or partition it, or use it as part of an LVM volume group or a RAID system. If you want to play with the loop device then you might find the kpartx program useful (lets you play with partitions on loop devices more easily).
So, having looked at everything from the real discs in your computer all the way to being able to use files on your filesystem as more block devices; your homework is to take a look at how the block storage is arranged on your computers -- sometimes Linux installers will use LVM without you knowing and sometimes they even look at setting up RAIDs for you. Take a look at how your discs are partitioned and then consumed by your computer, and gain a little more confidence in your ability to know what's going on.