What is version control?

Version control. Revision control. Software configuration management. These are all names for the same thing: keeping track of all the changes you make to your program's source code.

Because night-time coding

Version control is useful so you can remember what you've changed over time. For example, if you publish version 1.0 of your program, and later on version 2.0, and someone asks you what you changed, version control is what you need to answer that.

Seeing what changed is also important so you can figure out what caused your program to break. You release version 2.0, and now your frobniter no longer cogitates. You can't remember making any change to the cogitation module. Indeed, you could swear you haven't. But looking at the differences reveals that you did, indeed, make a change. What's more, it was 4 am in the night after your birthday party when you did that, which explains why you don't remember doing it.

Version control, when used properly, remembers every change you've made, not just releases, but much more fine grained. It will remember a snapshot of your work from every few minutes. Archived releases usually happen only fairly rarely.

Collaboration

Imagine a college or university terminal room. There's a few dozen computer terminals, or microcomputers, each one with a someone working on something. In one corner, there's a group of students working together on a group project. Every few minutes one of them asks, if it's safe to edit such and such a file. Every hour or two, there's a wail of anguish.

What they're doing is working together using a shared directory. Each of them is editing one file in the directory, and asking the others for permission. If two are editing the same file, they'll overwrite each other's changes. Sometimes they make mistakes and forget to ask for permission to edit a file.

Version control tools make collaboration easier. Everyone edits files on their own computer, and the version control tools synchronise the changes mostly automatically. The tools prevent anyone's changes from being overwritten.

Important concepts

There are many version control systems, but they share a few key concepts.

  • A repository is where the version control system stores all the versions of all the files.
  • When you've finished making some set of changes, you tell the version control system you've done that by making a commit.
  • You can create branches, which isolate work. Changes made to one branch don't affect any other branch. This allows you to do some experimental changes in one branch, without ruining the main line of development.
  • Branches can be merged, which means you take all the changes made in one branch and add them to another branch. The result contains everything from both branches. If your experimental changes turn out to be good, you can merge them into the main line of development. If they turn out not to be good, you can just drop the experimental branch, no harm done.
  • You can look at a diff (difference, list of changes) made from one version to another, or between branches. The diff is usually in the form of a unified diff, which looks slightly weird to begin with, but quickly becomes a very efficient way to see what's changed.

Version control systems are broadly classified into centralised and distributed systems. In a centralised one, every commit you make is immediately published to a repository on a server, and everyone collaborating on that project is using the same repository on the same server.

With a distributed system, there can be any number of repositories on any number of servers, and they need to be manually synchronised, using push and pull operations. Push sends your changes to the server, and pull retrives others' changes from the server.

The two important practical differences between centralised and distributed systems are that distributed systems are typically much, much better at merging, and individual developers are not at mercy of being granted commit access. The latter is very important for free software development.

With centralised systems, every commit requires write access to the repository on the server. For reasons of safety, security, and control, the set of people allowed to commit is usually quite restricted. This means that other developers are at a disadvantage: they can't commit their changes. This makes development awkward.

This is why distributed version control systems are replacing centralised ones, in free software development, but also in general.

Popular version control systems

There are many popular version control systems. Here's a short list of the free software ones:

  • git is a distributed version control system originally developed by Linus Torvalds for use with the Linux kernel. It is fairly efficient, and is used by a large number of free software projects now.
  • Mercurial is another distributed version control system. It's not as popular as git, but a number of well-known projects use it, for example Python.
  • Bazaar is also distributed, but failed to become popular outside Canonical and its Ubuntu distribution.
  • Subversion is a centralised system, which is quite popular and has been used by a large number of popular projects. The tide is changing in favour of git, however.
  • CVS is the grand-daddy of version control systems in the modern sense. It is outdated and archaic, but some projects still use it.

There are many more; Wikipedia has a list.

Which one should you learn? All the ones that are used by any of the projects you might want to contribute to.

Whicn one should you use for new projects? My vote is git, but if I tell you to use git, I'll be flamed by fans of other systems, I won't do that. Use your own judgement and preferences.

Example, with git

Here's an example of using git, for a project of your own. It doesn't show how to use a server for sharing code with others, only how to use it locally.

To start with, you should create a project directory, and initialise it.

mkdir ~/my-project
cd ~/my-project
git init .

This creates the .git subdirectory, where git keeps its own data about your source code.

After this, you can create some files. You can then add them to version control.

emacs foo.c
vi bar.c
git add foo.c bar.c

You can now commit the files to version control. Git will open an editor for you to write a commit message. The message should describe the changes your are committing.

git commit

You can now make further changes, and then look at what you've changed since the last commit.

emacs bar.c
git diff

When you're ready with a new set of changes, and you've reached a point where you want to commit, you do just that. You can do this by using git add again, or you can simplify this by using the -a option to git commit. You need to git add every new file, but -a will catch changes to files git already knows about.

git commit -a

You can then look at all the commits you've made.

git log

With various options to git log you can add more output. For example, the -p option will add a diff of the changes in each commit.

git log -p

For more information, see the git tutorial.

What should version control be used for?

Version control is most often used for program source code. However, you can use it for all sorts of things:

See also