pages tagged gityakkinghttp://yakking.branchable.com/tags/git/yakkingikiwiki2018-02-21T12:00:17ZThe only proper git workflow, use nothing elsehttp://yakking.branchable.com/posts/git-workflow/Lars Wirzenius2018-02-21T12:00:17Z2018-02-21T12:00:11Z
<p>The title of this article is intentionally provocative.</p>
<p>Git is a flexible tool that allows many kinds of workflow for using
it. Here is the workflow I favour for teams:</p>
<ul>
<li><p>The <code>master</code> branch is meant to be always releasable.</p></li>
<li><p>Every commit in <code>master</code> MUST pass the full test suite, though not
all commits in merged change sets need to do that.</p></li>
<li><p>Changes are done in dedicated branches, which get merged to <code>master</code>
frequently - avoid long-lived branches, since they tend to result in
much effort having to be spent on resolving merge conflicts.</p>
<ul>
<li>If frequent merging is, for some reason, not an option, at least
rebase the branch onto current master frequently: at least
daily. This keeps conflicts fairly small.</li>
</ul>
</li>
<li><p>Before merging a branch into <code>master</code>, rebase it onto <code>master</code> and
resolve any conflicts - also rebase the branch so it tells a clean
story of the change.</p>
<ul>
<li><p><code>git rebase -i master</code> is a very powerful tool. Learn it.</p></li>
<li><p>A clean story doesn't have commits that fix mistakes earlier in
the branch-to-be-merged, and introduces changes within the
branch in chunks of a suitable size, and in an order that makes
sense to the reader. Clean up "Fix typo in previous commit" type
of commits.</p></li>
</ul>
</li>
<li><p>Update the <code>NEWS</code> file when merging into <code>master</code>. Also Debian
packaging files, if those are included in the source tree.</p></li>
<li><p>Tag releases using PGP signed, annotated tags. I use a tool called
<a href="http://git.liw.fi/bumper/">bumper</a>, which updates <code>NEWS</code>, <code>version.py</code>, <code>debian/changelog</code>,
tags a release, and updates the files again with with <code>+git</code>
appended to version number.</p>
<ul>
<li>Review, update <code>NEWS</code>, <code>debian/changelog</code> before running bumper
to make sure they're up to date.</li>
</ul>
</li>
<li><p>Name branches and tags with a prefix <code>foo/</code> where <code>foo</code> is your
username, handle, or other identifier.</p></li>
<li><p>If <code>master</code> is broken, fixing it has highest priority for the
project.</p></li>
<li><p>If there is a need for the project to support older releases, create
a branch for each such, when needed, starting from the release's
tag. Treat release branches as <code>master</code> for that release.</p></li>
</ul>
On making releaseshttp://yakking.branchable.com/posts/releases/Lars Wirzenius2015-09-23T11:00:13Z2015-09-23T11:00:06Z
<p>Any software project with users other than the developers should make
releases.</p>
<ul>
<li><p>A release is the developers saying that the software is in a good
shape and ready to be used. This is important for the users, since
otherwise they would have to try various version control revisions
to find something that looks like it might work. Linux distributions
and others who package software want releases for the same reason.</p></li>
<li><p>Sysadmins want releases for an additional reason: they want to be
able to reproduce an installation, so that they can have many
identical machines.</p></li>
<li><p>Releases are important for the developers themselves as well.
Without releases, it becomes harder to support users, since the
first support step is to determine the version the user is running,
and that becomes harder if it might be any commit from any branch in
the history of the project.</p></li>
</ul>
<p>Given that releases are useful, what's a good way to make them? A
popular opinion today is that a release is made by <a href="https://en.wikipedia.org/wiki/Revision_tag">tagging</a> a commit in
a version control repository. This is the minimum for what constitutes
a release, and it's fine for many people. Your author is more picky.</p>
<p>For a proper release, the following is a reasonable checklist:</p>
<ul>
<li>A tag in version control (in <a href="http://git-scm.com/">git</a>, it should be <a href="https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work">OpenPGP signed</a>).</li>
<li>A release tar archive, compressed using a suitable compression tool.
<ul>
<li>The archive may contain more files than are in the tagged
version control commit. For example, autoconf-generated files
for configuring a build.</li>
</ul>
</li>
<li>A <a href="https://www.gnupg.org/gph/en/manual/x135.html#AEN160">detached OpenPGP signature</a> of the compressed tar archive.</li>
</ul>
<p>An actual tar archive is necessary because it may not be possible to
reproduce the release tar archive at a later date. For example, the
way git produce a tar archive may change, and the compression tool may
also change.</p>
<p>This even has security implications: it is plausible that an attack
could happen by exploiting a flaw in the decompression software, if
the attacker can use any compression program. If a release is made,
and the release artifacts (tar archive, detached signature) are
verified and tested, it gets much harder to construct an attack.</p>
<hr />
<p>A short checklist for making a release:</p>
<ul>
<li>Ensure the software is in a fit state to be released. It should work
correctly; documentation, translations, and release notes should be
up to date; and so on.</li>
<li>In particular, ensure the desired version number is correct and consistently
applied across the software.</li>
<li>Sign the release tag.</li>
<li>Produce the release tar archive (compressed), including any
generated files it should contain.</li>
<li>Produce the detached signature for the tar archive.</li>
<li>Produce any other release artifacts.</li>
<li>Test the release artifacts in some suitable way.</li>
<li>Publish the release artifacts and push the signed tag.</li>
<li>Announce the release in a suitable way: on the project website,
blog, mailing list, or using a message in a bottle, depending on the
project.</li>
<li>Figure out a way to automate as much as possible of this so it's
easier to do the next time.</li>
</ul>
<p><em>For most projects, making a release should happen often enough that it
pays off to automate most of the process.</em></p>
Jargonhttp://yakking.branchable.com/posts/jargon/Will Holland2015-06-10T11:00:14Z2015-06-10T11:00:09Z
<p>How often have you found yourself searching for that shell command you have
used 755 times before? What makes the incantation so hard to remember? Maybe
it involves a string of seemingly random characters, something a lot of older
Unix packages are guilty of, or maybe the choice of terminology is poor.</p>
<p>Software, like many geekdoms, is full of pop-culture references, acronyms,
made-up words, and puns. This tangled web of jargon can lead to confusion and
drive new-comers away. If you are writing a commandline package then its name,
subcommands, config, and all other terminology needs to be simple and
descriptive.</p>
<p>An example of a package which uses poorly chosen terminology is the version
control system <a href="https://git-scm.com/">Git</a>. If you have ever used <a href="https://git-scm.com/">Git</a> then you will have
committed changes. These changes must be <em>staged</em>, which is done by <em>adding</em>
the change to the <em>index</em>; to view the <em>index</em> the command <code>git status</code> is
used. <em>Stage</em>, <em>add</em>, <em>index</em> and <em>status</em>. Four terms have been introduced to
refer to what is in practice a straight forward concept. In the source code
these four concepts are probably distinct but to the user there is no need to
introduce so many when fewer, more carefully chosen, terms could have done the
job.</p>
<p>When deciding on a naming scheme there are a few things to bear in mind. First
and foremost is to make the names easy to understand. Ensure that people have
a good idea of how a concept fits into the project from its name alone. A
technique commonly used for choosing terms is is to pick a <a href="http://en.wiktionary.org/wiki/metaphor">metaphor</a>.
Metaphors are useful because they can show how terms relate to each other as
well as doing a lot of the work for you when you need to find names for new
concepts. For example maybe you are writing an image editor: the metaphor
could be painting, the tool to add some pixels might be called the paint brush,
the tool to remove pixels could be paint-stripper, and so on.</p>
<p>Choosing a naming scheme is not easy, but thankfully a lot of areas of software
have pre-existing naming scheme norms. For example if you go to a website and
it talks about <em>comment</em> then you have a good idea of what it means. If you
avoid well established norms it can even lead to additional confusion.</p>
<p>Another point, worth noting, is to consider how words are pluralised. There do
exist projects where people have been confused by the plural of a word not
being obvious. For example if you have a script which creates pictures of
sea-creatures, then to generate an octopus you might run <code>./sea-creatues
--octopus</code>, and to generate a picture of more than one octopus maybe
<code>./sea-creatures --octopodes</code>. Although <em>octopodes</em> is a real word it is not
<strong>obviously</strong> the plural of <em>octopus</em>. Try to choose words which pluralise
simply in English (ie. by appending the letter s).</p>
<p>Next time you write a piece of software, put some thought into the terminology:
is it easy to understand or is it jargon? Using puns and pop-culture
references rather than a simple metaphor may amuse a minority of people but it
is likely to put others off.</p>
Basics of version control systemshttp://yakking.branchable.com/posts/basics-of-cs/Lars Wirzenius2013-11-06T12:00:29Z2013-11-06T12:00:08Z
<h1>What is version control?</h1>
<p>Version control. Revision control. Software configuration management.
These are all names for the same thing: keeping track of all the
changes you make to your program's source code.</p>
<h2>Because night-time coding</h2>
<p>Version control is useful so you can remember what you've changed
over time. For example, if you publish version 1.0 of your program,
and later on version 2.0, and someone asks you what you changed,
version control is what you need to answer that.</p>
<p>Seeing what changed is also important so you can figure out what
caused your program to break. You release version 2.0, and now your
frobniter no longer cogitates. You can't remember making any change to
the cogitation module. Indeed, you could swear you haven't. But
looking at the differences reveals that you did, indeed, make a
change. What's more, it was 4 am in the night after your birthday
party when you did that, which explains why you don't remember doing
it.</p>
<p>Version control, when used properly, remembers every change you've
made, not just releases, but much more fine grained. It will remember
a snapshot of your work from every few minutes. Archived releases
usually happen only fairly rarely.</p>
<h2>Collaboration</h2>
<p>Imagine a college or university terminal room. There's a few
dozen computer terminals, or microcomputers, each one with a
someone working on something. In one corner, there's a group
of students working together on a group project. Every few
minutes one of them asks, if it's safe to edit such and such
a file. Every hour or two, there's a wail of anguish.</p>
<p>What they're doing is working together using a shared directory.
Each of them is editing one file in the directory, and asking
the others for permission. If two are editing the same file,
they'll overwrite each other's changes. Sometimes they make
mistakes and forget to ask for permission to edit a file.</p>
<p>Version control tools make collaboration easier. Everyone
edits files on their own computer, and the version control
tools synchronise the changes mostly automatically. The tools
prevent anyone's changes from being overwritten.</p>
<h1>Important concepts</h1>
<p>There are many version control systems, but they share a
few key concepts.</p>
<ul>
<li>A <strong>repository</strong> is where the version control system
stores all the versions of all the files.</li>
<li>When you've finished making some set of changes, you
tell the version control system you've done that
by making a <strong>commit</strong>.</li>
<li>You can create <strong>branches</strong>, which isolate work.
Changes made to one branch don't affect any other branch.
This allows you to do some experimental changes in
one branch, without ruining the main line of development.</li>
<li>Branches can be <strong>merged</strong>, which means you take all
the changes made in one branch and add them to another
branch. The result contains everything from both branches.
If your experimental changes turn out to be good,
you can merge them into the main line of development.
If they turn out not to be good, you can just drop the
experimental branch, no harm done.</li>
<li>You can look at a <a href="https://en.wikipedia.org/wiki/Diff">diff</a> (difference, list of changes)
made from one version to another, or between branches.
The diff is usually in the form of a <a href="https://en.wikipedia.org/wiki/Diff#Unified_format">unified diff</a>,
which looks slightly weird to begin with, but quickly
becomes a very efficient way to see what's changed.</li>
</ul>
<p>Version control systems are broadly classified into
<strong>centralised</strong> and <strong>distributed</strong> systems. In a centralised
one, every commit you make is immediately published to
a repository on a server, and everyone collaborating on
that project is using the same repository on the same server.</p>
<p>With a distributed system, there can be any number of
repositories on any number of servers, and they need to
be manually synchronised, using <strong>push</strong> and <strong>pull</strong>
operations. Push sends your changes to the server,
and pull retrives others' changes from the server.</p>
<p>The two important practical differences between centralised and
distributed systems are that distributed systems are typically
much, much better at merging, and individual developers are not
at mercy of being granted commit access. The latter is very
important for free software development.</p>
<p>With centralised systems, every commit requires write access
to the repository on the server. For reasons of safety, security,
and control, the set of people allowed to commit is usually
quite restricted. This means that other developers are at a
disadvantage: they can't commit their changes. This makes
development awkward.</p>
<p>This is why distributed version control systems are replacing
centralised ones, in free software development, but also in
general.</p>
<h1>Popular version control systems</h1>
<p>There are many popular version control systems. Here's a
short list of the free software ones:</p>
<ul>
<li><a href="http://git-scm.com/">git</a> is a distributed version control system originally
developed by Linus Torvalds for use with the Linux kernel.
It is fairly efficient, and is used by a large number of
free software projects now.</li>
<li><a href="http://mercurial.selenic.com/">Mercurial</a> is another distributed version control
system. It's not as popular as git, but a number of
well-known projects use it, for example <a href="http://hg.python.org/cpython/">Python</a>.</li>
<li><a href="http://bazaar.canonical.com/en/">Bazaar</a> is also distributed, but failed to become
popular outside Canonical and its Ubuntu distribution.</li>
<li><a href="http://subversion.tigris.org/">Subversion</a> is a centralised system, which is quite
popular and has been used by a large number of popular
projects. The tide is changing in favour of git, however.</li>
<li><a href="http://www.nongnu.org/cvs/">CVS</a> is the grand-daddy of version control systems in
the modern sense. It is outdated and archaic, but
some projects still use it.</li>
</ul>
<p>There are many more;
<a href="https://en.wikipedia.org/wiki/List_of_revision_control_software">Wikipedia</a>
has a list.</p>
<p>Which one should you learn? All the ones that are used
by any of the projects you might want to contribute to.</p>
<p>Whicn one should you use for new projects? My vote is git, but if I
tell you to use git, I'll be flamed by fans of other systems, I won't
do that. Use your own judgement and preferences.</p>
<h1>Example, with git</h1>
<p>Here's an example of using git, for a project of your own.
It doesn't show how to use a server for sharing code with
others, only how to use it locally.</p>
<p>To start with, you should create a project directory, and
<strong>initialise it</strong>.</p>
<pre><code>mkdir ~/my-project
cd ~/my-project
git init .
</code></pre>
<p>This creates the <code>.git</code> subdirectory, where git keeps
its own data about your source code.</p>
<p>After this, you can create some files. You can then
<strong>add</strong> them to version control.</p>
<pre><code>emacs foo.c
vi bar.c
git add foo.c bar.c
</code></pre>
<p>You can now <strong>commit</strong> the files to version control. Git will
open an editor for you to write a <strong>commit message</strong>. The
message should describe the changes your are committing.</p>
<pre><code>git commit
</code></pre>
<p>You can now make further changes, and then look at what you've
changed since the last commit.</p>
<pre><code>emacs bar.c
git diff
</code></pre>
<p>When you're ready with a new set of changes, and you've reached a
point where you want to commit, you do just that. You can do this by
using <code>git add</code> again, or you can simplify this by using the <code>-a</code>
option to <code>git commit</code>. You need to <code>git add</code> every new file, but <code>-a</code>
will catch changes to files git already knows about.</p>
<pre><code>git commit -a
</code></pre>
<p>You can then look at all the commits you've made.</p>
<pre><code>git log
</code></pre>
<p>With various options to <code>git log</code> you can add more output. For
example, the <code>-p</code> option will add a diff of the changes in each
commit.</p>
<pre><code>git log -p
</code></pre>
<p>For more information, see the <a href="https://www.kernel.org/pub/software/scm/git/docs/gittutorial.html">git tutorial</a>.</p>
<h1>What should version control be used for?</h1>
<p>Version control is most often used for program source code. However,
you can use it for all sorts of things:</p>
<ul>
<li>system configuration files: <a href="http://joeyh.name/code/etckeeper/">etckeeper</a></li>
<li>personal configuration files: <a href="http://vcs-home.branchable.com/">vcshome</a></li>
<li>web site content: <a href="http://ikiwiki.info/">ikiwiki</a></li>
<li>sharing files between computers: <a href="http://git-annex.branchable.com/">git-annex</a></li>
</ul>
<h1>See also</h1>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Version_control">Wikipedia article on version
control</a></li>
</ul>