pages tagged git

The only proper git workflow, use nothing else

2018-02-21T12:00:17Z

The title of this article is intentionally provocative.

Git is a flexible tool that allows many kinds of workflow for using it. Here is the workflow I favour for teams:

The master branch is meant to be always releasable.
Every commit in master MUST pass the full test suite, though not all commits in merged change sets need to do that.
Changes are done in dedicated branches, which get merged to master frequently - avoid long-lived branches, since they tend to result in much effort having to be spent on resolving merge conflicts.
- If frequent merging is, for some reason, not an option, at least rebase the branch onto current master frequently: at least daily. This keeps conflicts fairly small.
Before merging a branch into master, rebase it onto master and resolve any conflicts - also rebase the branch so it tells a clean story of the change.
- git rebase -i master is a very powerful tool. Learn it.
- A clean story doesn't have commits that fix mistakes earlier in the branch-to-be-merged, and introduces changes within the branch in chunks of a suitable size, and in an order that makes sense to the reader. Clean up "Fix typo in previous commit" type of commits.
Update the NEWS file when merging into master. Also Debian packaging files, if those are included in the source tree.
Tag releases using PGP signed, annotated tags. I use a tool called bumper, which updates NEWS, version.py, debian/changelog, tags a release, and updates the files again with with +git appended to version number.
- Review, update NEWS, debian/changelog before running bumper to make sure they're up to date.
Name branches and tags with a prefix foo/ where foo is your username, handle, or other identifier.
If master is broken, fixing it has highest priority for the project.
If there is a need for the project to support older releases, create a branch for each such, when needed, starting from the release's tag. Treat release branches as master for that release.

On making releases

2015-09-23T11:00:13Z

Any software project with users other than the developers should make releases.

A release is the developers saying that the software is in a good shape and ready to be used. This is important for the users, since otherwise they would have to try various version control revisions to find something that looks like it might work. Linux distributions and others who package software want releases for the same reason.
Sysadmins want releases for an additional reason: they want to be able to reproduce an installation, so that they can have many identical machines.
Releases are important for the developers themselves as well. Without releases, it becomes harder to support users, since the first support step is to determine the version the user is running, and that becomes harder if it might be any commit from any branch in the history of the project.

Given that releases are useful, what's a good way to make them? A popular opinion today is that a release is made by tagging a commit in a version control repository. This is the minimum for what constitutes a release, and it's fine for many people. Your author is more picky.

For a proper release, the following is a reasonable checklist:

A tag in version control (in git, it should be OpenPGP signed).
A release tar archive, compressed using a suitable compression tool.
- The archive may contain more files than are in the tagged version control commit. For example, autoconf-generated files for configuring a build.
A detached OpenPGP signature of the compressed tar archive.

An actual tar archive is necessary because it may not be possible to reproduce the release tar archive at a later date. For example, the way git produce a tar archive may change, and the compression tool may also change.

This even has security implications: it is plausible that an attack could happen by exploiting a flaw in the decompression software, if the attacker can use any compression program. If a release is made, and the release artifacts (tar archive, detached signature) are verified and tested, it gets much harder to construct an attack.

A short checklist for making a release:

Ensure the software is in a fit state to be released. It should work correctly; documentation, translations, and release notes should be up to date; and so on.
In particular, ensure the desired version number is correct and consistently applied across the software.
Sign the release tag.
Produce the release tar archive (compressed), including any generated files it should contain.
Produce the detached signature for the tar archive.
Produce any other release artifacts.
Test the release artifacts in some suitable way.
Publish the release artifacts and push the signed tag.
Announce the release in a suitable way: on the project website, blog, mailing list, or using a message in a bottle, depending on the project.
Figure out a way to automate as much as possible of this so it's easier to do the next time.

For most projects, making a release should happen often enough that it pays off to automate most of the process.

Jargon

2015-06-10T11:00:14Z

How often have you found yourself searching for that shell command you have used 755 times before? What makes the incantation so hard to remember? Maybe it involves a string of seemingly random characters, something a lot of older Unix packages are guilty of, or maybe the choice of terminology is poor.

Software, like many geekdoms, is full of pop-culture references, acronyms, made-up words, and puns. This tangled web of jargon can lead to confusion and drive new-comers away. If you are writing a commandline package then its name, subcommands, config, and all other terminology needs to be simple and descriptive.

An example of a package which uses poorly chosen terminology is the version control system Git. If you have ever used Git then you will have committed changes. These changes must be staged, which is done by adding the change to the index; to view the index the command git status is used. Stage, add, index and status. Four terms have been introduced to refer to what is in practice a straight forward concept. In the source code these four concepts are probably distinct but to the user there is no need to introduce so many when fewer, more carefully chosen, terms could have done the job.

When deciding on a naming scheme there are a few things to bear in mind. First and foremost is to make the names easy to understand. Ensure that people have a good idea of how a concept fits into the project from its name alone. A technique commonly used for choosing terms is is to pick a metaphor. Metaphors are useful because they can show how terms relate to each other as well as doing a lot of the work for you when you need to find names for new concepts. For example maybe you are writing an image editor: the metaphor could be painting, the tool to add some pixels might be called the paint brush, the tool to remove pixels could be paint-stripper, and so on.

Choosing a naming scheme is not easy, but thankfully a lot of areas of software have pre-existing naming scheme norms. For example if you go to a website and it talks about comment then you have a good idea of what it means. If you avoid well established norms it can even lead to additional confusion.

Another point, worth noting, is to consider how words are pluralised. There do exist projects where people have been confused by the plural of a word not being obvious. For example if you have a script which creates pictures of sea-creatures, then to generate an octopus you might run ./sea-creatues --octopus, and to generate a picture of more than one octopus maybe ./sea-creatures --octopodes. Although octopodes is a real word it is not obviously the plural of octopus. Try to choose words which pluralise simply in English (ie. by appending the letter s).

Next time you write a piece of software, put some thought into the terminology: is it easy to understand or is it jargon? Using puns and pop-culture references rather than a simple metaphor may amuse a minority of people but it is likely to put others off.

Basics of version control systems

2013-11-06T12:00:29Z

What is version control?

Version control. Revision control. Software configuration management. These are all names for the same thing: keeping track of all the changes you make to your program's source code.

Because night-time coding

Version control is useful so you can remember what you've changed over time. For example, if you publish version 1.0 of your program, and later on version 2.0, and someone asks you what you changed, version control is what you need to answer that.

Seeing what changed is also important so you can figure out what caused your program to break. You release version 2.0, and now your frobniter no longer cogitates. You can't remember making any change to the cogitation module. Indeed, you could swear you haven't. But looking at the differences reveals that you did, indeed, make a change. What's more, it was 4 am in the night after your birthday party when you did that, which explains why you don't remember doing it.

Version control, when used properly, remembers every change you've made, not just releases, but much more fine grained. It will remember a snapshot of your work from every few minutes. Archived releases usually happen only fairly rarely.

Collaboration

Imagine a college or university terminal room. There's a few dozen computer terminals, or microcomputers, each one with a someone working on something. In one corner, there's a group of students working together on a group project. Every few minutes one of them asks, if it's safe to edit such and such a file. Every hour or two, there's a wail of anguish.

What they're doing is working together using a shared directory. Each of them is editing one file in the directory, and asking the others for permission. If two are editing the same file, they'll overwrite each other's changes. Sometimes they make mistakes and forget to ask for permission to edit a file.

Version control tools make collaboration easier. Everyone edits files on their own computer, and the version control tools synchronise the changes mostly automatically. The tools prevent anyone's changes from being overwritten.

Important concepts

There are many version control systems, but they share a few key concepts.

A repository is where the version control system stores all the versions of all the files.
When you've finished making some set of changes, you tell the version control system you've done that by making a commit.
You can create branches, which isolate work. Changes made to one branch don't affect any other branch. This allows you to do some experimental changes in one branch, without ruining the main line of development.
Branches can be merged, which means you take all the changes made in one branch and add them to another branch. The result contains everything from both branches. If your experimental changes turn out to be good, you can merge them into the main line of development. If they turn out not to be good, you can just drop the experimental branch, no harm done.
You can look at a diff (difference, list of changes) made from one version to another, or between branches. The diff is usually in the form of a unified diff, which looks slightly weird to begin with, but quickly becomes a very efficient way to see what's changed.

Version control systems are broadly classified into centralised and distributed systems. In a centralised one, every commit you make is immediately published to a repository on a server, and everyone collaborating on that project is using the same repository on the same server.

With a distributed system, there can be any number of repositories on any number of servers, and they need to be manually synchronised, using push and pull operations. Push sends your changes to the server, and pull retrives others' changes from the server.

The two important practical differences between centralised and distributed systems are that distributed systems are typically much, much better at merging, and individual developers are not at mercy of being granted commit access. The latter is very important for free software development.

With centralised systems, every commit requires write access to the repository on the server. For reasons of safety, security, and control, the set of people allowed to commit is usually quite restricted. This means that other developers are at a disadvantage: they can't commit their changes. This makes development awkward.

This is why distributed version control systems are replacing centralised ones, in free software development, but also in general.

Popular version control systems

There are many popular version control systems. Here's a short list of the free software ones:

git is a distributed version control system originally developed by Linus Torvalds for use with the Linux kernel. It is fairly efficient, and is used by a large number of free software projects now.
Mercurial is another distributed version control system. It's not as popular as git, but a number of well-known projects use it, for example Python.
Bazaar is also distributed, but failed to become popular outside Canonical and its Ubuntu distribution.
Subversion is a centralised system, which is quite popular and has been used by a large number of popular projects. The tide is changing in favour of git, however.
CVS is the grand-daddy of version control systems in the modern sense. It is outdated and archaic, but some projects still use it.

There are many more; Wikipedia has a list.

Which one should you learn? All the ones that are used by any of the projects you might want to contribute to.

Whicn one should you use for new projects? My vote is git, but if I tell you to use git, I'll be flamed by fans of other systems, I won't do that. Use your own judgement and preferences.

Example, with git

Here's an example of using git, for a project of your own. It doesn't show how to use a server for sharing code with others, only how to use it locally.

To start with, you should create a project directory, and initialise it.

mkdir ~/my-project
cd ~/my-project
git init .

This creates the .git subdirectory, where git keeps its own data about your source code.

After this, you can create some files. You can then add them to version control.

emacs foo.c
vi bar.c
git add foo.c bar.c

You can now commit the files to version control. Git will open an editor for you to write a commit message. The message should describe the changes your are committing.

git commit

You can now make further changes, and then look at what you've changed since the last commit.

emacs bar.c
git diff

When you're ready with a new set of changes, and you've reached a point where you want to commit, you do just that. You can do this by using git add again, or you can simplify this by using the -a option to git commit. You need to git add every new file, but -a will catch changes to files git already knows about.

git commit -a

You can then look at all the commits you've made.

git log

With various options to git log you can add more output. For example, the -p option will add a diff of the changes in each commit.

git log -p

For more information, see the git tutorial.

What should version control be used for?

Version control is most often used for program source code. However, you can use it for all sorts of things:

system configuration files: etckeeper
personal configuration files: vcshome
web site content: ikiwiki
sharing files between computers: git-annex