Before I became a Linux user, I wrote code using Delphi; then, as a software engineer in the world of Windows gaming, I used Visual C++ 6 as my development environment where I worked. In contrast, from pretty much the moment I found Linux in the early 90s, where I "played", I used emacs and vim.

These days I spend most of my time in Emacs and Vim on Linux because it's my work environment, my personal dev environment and my home play environment. Many people I know who work mostly in Java, C# and Windows look at me aghast, wondering how I can comfortably work outside of what they think of as an Integrated Development Environment.

To explain fully, let's start by saying that an Integrated Development Environment (commonly contracted to an IDE) is commonly thought of as a single piece of software which provides tools to the software engineer to help them develop software. Typically this will include an editor, a compiler interface including gathering warnings and errors and linking them back to the editor, and a debugger whose interface is also the editor. These days IDEs also include things like code-completion, integrated running of test suites, access to revision control etc.

There are many IDEs out there such as Microsoft Visual Studio which I mention because, despite being for Windows, it is free and actually very good; and on the free software side there's Eclipse, IntelliJ IDEA, Code::blocks. Obviously there's more, but those are the most commonly thought of when IDEs are mentioned.

What a lot of people don't appreciate though, is that a properly consistent Linux-based environment can provide an IDE-like interface for you without needing to lose your favoured interaction systems. Emacs, a terminal with ZShell and Gdb can easily be as productive as an IDE if you're used to it and competent with the tools.

There are ways to improve any development environment (e.g. some support embedding another editor such as Vim or Emacs) but often if you're not using the defaults, then some things (such as integrated debugging) simply cease to operate.

If you like gui editors, integrated debugging, the compiler to be run for you, etc then you will very likely enjoy some of the traditional IDEs out there and I recommend you have a play with them. If you prefer to have control over every aspect of your development environment then I expect you'll do better to learn more about the tools available to you such as Ddd etc.

But, no matter your choice, make the effort to integrate the tools together in your workflows and approaches, if not in code itself.

Posted Wed Dec 3 12:00:08 2014

Digital signatures are crucial for free software. They are needed for ensuring that the source code you find floating around on the Internet is the same source code its developers released.

There are, of course, other reasons to use digital signatures, or encryption, but for this article, we'll concentrate on using them against tampering and accidental changes.

Public key cryptography

Digital signatures are implemented using public key cryptography. In this kind of system everyone has two keys that form a pair. One key is private, the other public. Anything encrypted with the public key of a key pair can only be decrypted by the corresponding private key, and vice versa. In traditional cryptography, there is only one key, which must be kept secret.

Having a key pair allows digital signatures to work. The developer encrypts the source code distribution with their private key, and as a result anyone with the public key, which is anyone who cares, can decrypt it. Assuming the developer is actually the only one with the private key, this proves that the source code was released by them.

The key pair works in the opposite direction as well: if you want to send a bug report that only the developer should see, you encrypt it with their public key, and then only they can read it.

Web of trust

How do we know the key belongs to the developer? The two common approaches here is either to trust someone else, or to trust many others.

SSL and TLS ("https") work in the first manner. There are a relatively small number of companies, call certificate authorities, who digitally sign everyone else's keys with their own. Web browsers have the public keys of the certificate authorities, and can thus check that the SSL and TLS key is the one a website is meant to use. This works wonderfully, except that it creates a situation where you have to pay money to be able to communicate. It also assumes that all certificate authorities are trustworthy, and that web browsers only keep trustworthy ones on their list.

The other approach is to let everyone vouch for anyone else, but let everyone choose themselves whom they'll trust as introducers. This is called the web of trust. As an example, let's say I meet Daniel in person, having known him for many years, and we sign each others keys. Later, Daniel signs Richard's key. If I trust Daniel, then I can be confident that Richard's key is really his.

The web of trust is used by the OpenPGP encryption system, of which GnuPG (or gpg) is the common implementation. As a non-hierarchical, de-centralised system, it works much better for free software development than the certificate authority approach of web browsing.

Which software to use?

There is only one commonly used implementation of the OpenPGP encryption system in the free software world: GnuPG. This seems almost incredible, given how common it is for there to be alternative implementations of everything. However, that may be explained by the fact that there are two major branches of GnuPG in use, both supported by its deveopers.

The 1.4 branch is the older one, and is the default in Debian. The 2.0 branch has various new features, but is quite compatible with the older version. For most people, it doesn't matter which one you use.

Use the gpg your Linux distribution has installed. It almost certainly has one installed by default.

Using GnuPG

Rather than writing a new, short manual for GnuPG, we'll refer to the existing documentation:

  • The Mini-HOWTO is a good way to get started.

  • The website has a whole bunch more documentation, most of which you don't need to read.

Verifying digital signatures

For free software distribution, PGP digital signatures are used primarily in three ways:

  • Signed commits or release tags in version control systems that support those (git does). To verify these, you use the version control tool.

  • Source and binary packages in distributions that are either signed directly or indirectly via a listing (a la Debian). To verify these, you use the distribution's tools. It probably happens automatically.

  • Release tarballs with detached signatures. These you need to verify manually. The command is gpg --verify foo.tar.xz.sig (see documentation for details).

Notice how GnuPG is usually used in the background already. You're a GnuPG user without knowing it.

Making digital signatures

When you want to release your own software, you should make a tarball and provide a "detached signature". This means the signature is in a separate file, not part of the tarball.

gpg --detach-sign foo.tar.xz

Then you publish both foo.tar.xz and foo.tar.xz.sig, and everyone will be happy.

On paranoia

Every time geeks discuss encryption of any form, they become paranoid. It is common for us to start worrying about the way our PGP secret keys are stored. After all, if the secret key is stolen, it can result in all your communications becoming public, and all your software releases to become untrustworthy.

Such worry is good, to a point. It's good to keep your private keys private. Very, very few people, however, need to worry about targeted attacks: if you think CIA and Mossad agents will be colliding in the night in your living room while looking for the hidden USB stick with your private keys, you should probably think again. Unless you're Edward Snowden. You need to do your own risk analysis and decide what your threats are. Be careful, but sensible.

Posted Wed Dec 10 18:14:30 2014 Tags:

Iterators

Iterators, in themselves, aren't a revolutionary idea; many programming languages have them, but much of python's standard library is concerned with producing and consuming iterators.

iter

The iter function turns an iterable into an iterator. This is not normally required since functions that require an iterator also accept iterables.

iter can also be used to create an iterator from a function. For example, you can create an iterator that reads lines of input until the first empty line:

import sys
entries = iter(sys.stdin.readline, '\n')
for line in entries:
    sys.stdout.write(line)

Because of the magic of iterators, this will print lines out as they are entered, rather than after the first empty line.

itertools

There are many useful functions to manipulate iterators in the itertools library.

The ones I use most often are chain.from_iterable and product.

product

product takes any number of iterables, and returns tuples of every combination, in what is called by mathematicians the cartesian product.

What this means is that the following is functionally equivalent:

for x in [1, 2, 3]:
    for y in ['a', 'b', 'c']:
        print(x, y)

for x, y in itertools.product([1, 2, 3], ['a', 'b', 'c']):
    print(x, y)

Except the latter example needs less indentation, which makes it easier to keep to a maximum code width of 79 columns.

chain.from_iterable

chain takes iterators and returns an iterator that returns the values of each iterator in turn.

This can be used to merge dicts, as the dict constructor can take an iterator that returns pairs of key and value; but I've not found too many used for chain by itself.

This example implements a simple env(1) command using chain, takewhile and dropwhile though.

import itertools
import os
import subprocess
import sys
env_args = itertools.takewhile(lambda x: '=' in x, sys.argv[1:])
exec_args = itertools.dropwhile(lambda x: '=' in x, sys.argv[1:])
cmdline_env = []
for arg in env_args:
    key, value = arg.split('=', 1)
    cmdline_env.append((key, value))
new_env = itertools.chain(os.environ.iteritems(), cmdline_env)
args = list(exec_args)
os.execvpe(args[0], args, new_env)

chain has an alternative chain.from_iterable constructor, which takes an iterable of iterables. I find this useful when I have a set of objects that have an iterable field, and want to get the set of all those items.

import itertools
class Foo(object):
   def __init__(self, bars):
       self.bars = set(bars)
foos = set()
foos.add(Foo([1, 2, 3]))
foos.add(Foo([2, 3, 4]))
all_bars = set(itertools.chain.from_iterable(foo.bars for foo in foos))
assert all_bars == set([1, 2, 3, 4])

Generators

You may have noticed I passed something weird to the call to chain.from_iterable, this is called a generator expression.

Generators are a short-hand for creating certain kinds of iterator.

Indeed we could have used itertools.imap(lambda foo: foo.bars, foos), but as you can see, the generator expression syntax is shorter, and once you understand its general form, simpler.

You can do both filtering and mapping operations in generator expressions, so the following expressions are equivalent.

itertools.imap(transform, itertools.ifilter(condition, it))

(transform(x) for x in it if condition(x))

However, there's some calculations that aren't as easily expressed as a simple expression. To handle this, you can have generator functions.

generator functions

generator functions are a convenient syntax for creating generators from what looks like a normal function.

Rather than creating a container to keep the result of your calculation and returning that at the end, you can yield the individual values, and it will resume execution the next time you ask the iterator for the next value.

They are useful for calculations where the result is not simple, and may even be recursive.

def foo(bar):
    yield bar
    for baz in bar.qux:
        for x in foo(baz):
            yield x

Sub-generators

If you have python of version 3.3 or higher available, then you can use the yield from statement to delegate to sub-generators.

def foo(bar):
    yield bar
    for baz in bar.qux:
        yield from foo(baz)

In this example, rather than using yield from, you can do:

for x in foo(baz):
    yield x

However this is longer and doesn't handle the potential interesting corner cases where values can be passed into a generator function or returned when iteration ends.

Context managers

Context managers are used with the with statement. A context manager can be any object that defines the __enter__ and __exit__ methods.

You don't need a dedicated object to be a context manager, using the open file object as a context manager will have it close the file at the end of the with block. It is common to use open this way:

with open('foo.txt', 'w') as f:
    f.write('bar\n')

You can define the __enter__ and __exit__ methods yourself, but provided you don't need much logic at construction time (you rarely do) and you don't need it to be re-usable, you can define a context manager like:

import contextlib
import os
@contextlib.contextmanager
def chdir(path):
    pwd = os.getcwd()
    os.chdir(path)
    try:
        yield
    finally:
        os.chdir(pwd)

This uses a generator function that yields only one value (in this case we yield a None implicitly), and mostly exists so that you can run cleanup code after it has finished and re-enters your generator function.

The try...finally is necessary because when you yield in a context manager, it is resumed when the with block finishes, which can be from an exception. If it is from an exception then it is raised inside the context manager function, so to ensure that the chdir is always run, you need to wrap the yield in a try...finally block.

Posted Wed Dec 17 12:00:06 2014 Tags:

I no longer live in the same city as my fellow Yakking authors Daniel and Richard, but we communicate over IRC. As I write this, they're in a cafe, writing new articles together, and mentioned that the table next to them are having a fairly common type of discussion.

  • "blah blah men do xxxx"

  • "blah blah women expect yyyy"

Saying that kind of thing tends to indicate you're not appreciating the diversity of humanity. Indeed, you're de-humanising groups of people by treating them as identical, just because of a shared attribute, rather than as unique persons.

"blah blah some people do xxxx or expect yyyy" tends to be less discriminating and more accurate. It makes it evident that you at least try to see people as who they are, rather than judging them based on aspects of themselves they can't do anything about.

This is important when you participate in any kind of free software community, as well. Even in a software context, you should treat people as people, and treat them well. Without a diverse set of people, software is worse. More importantly, treating people well Doing The Right Thing, a bit like keeping one's code clean is DTRT.

Posted Mon Dec 22 12:00:08 2014 Tags:

While we cannot talk with any authority about the wider issues of freedom, we can and will encourage you to spread the gift of Software Freedom in this festive period. There are, of course, many free operating systems available to you, including the various Linux distributions, BSDs, etc. However since we're a mostly Linux-focussed website, we'll only look at Linux.

Now, as you're all undoubtedly aware by now, I am very heavily biased in favour of Debian as an operating system. But if you are going to be gifting software freedom to someone soon, it may make sense to use one of the Linux distributions more geared to less familiar-with-UNIX users.

There are a number of supposedly better-for-beginners distributions, but an article about linux distributions for new users suggests Linux Mint, Ubuntu, and Linux Deepin. Within the Ubuntu fold there are a number of options, but if I were to go Ubuntu I'd probably go XUbuntu for a beginner.

Whichever option you go for, most distributions provide live "CD"s with installers built into them. This allows the recipient of your gift to try it out before installing it onto their system. Just in case they're not quite ready for all the power, flexibility and freedom that such a gift brings them.

Once you have a copy of the live image on your computer, it's time to get it written to some medium you can supply to your unsuspecting receiver of the joy known only to those with liberated computing experiences. There are various options -- obviously as mentioned above, these are often referred to as "CD" images, although rarely these days do they actually fit on a CD. More often than not, they're DVD-sized, but a DVD writer is a pretty common piece of equipment. If you lack that, or they lack a DVD drive in their computer, you may be better off finding a USB stick (I suggest buying a nice one, after all packaging up the gift is part of the fun) and install it on that.

Whatever distribution you choose, or even if you choose a BSD, or simply to help someone install some free software on their non-free platform (e.g. installing GIMP on their Windows box.) don't forget that the most valuable thing you can gift alongside the installer is your time to encourage, assist, troubleshoot and ultimately free your friend/family-member/random-stranger/lover.

Posted Wed Dec 24 12:00:07 2014 Tags:
Richard Maw Cool bits of C

C is popular, yet old. Its deficiencies caused C++ to be written to address this. However C has not gone away, and has undergone some parallel evolution to fix some of these problems without completely reinventing itself.

Designated initializers

This is a bit of a mouthful, but it's all about how you can initialize data structures.

Normally with C you define a data structure like:

struct foo {
    char *bar;
    char *baz;
    char *qux;
};

You can initialize it at the point of declaration by giving it values in {} braces, as you would pass arguments to a function:

struct foo f = { "bar", NULL, "qux" };

This is annoying, as it means you have to rework all your code when the struct changes, and you may not notice the error, since it is not obvious that the values are wrong.

Alternatively you could initialize the values in code:

struct foo f;
memset(&f, 0, sizeof(f));
f.bar = "bar";
f.qux = "qux";

This would compile to roughly the same code, but it is very verbose, and

If you were using C++ you would define a constructor for the struct, and you wouldn't need to rework all the users of the struct when the order of the fields changes, since the constructor would ensure that it filled out the data in the struct in the correct order.

You can simulate this in C with your own initializer function:

void foo_init(struct foo *f, char *bar, char *baz, char *qux)
{
    f->bar = bar;
    f->baz = baz;
    f->qux = qux;
}
...
struct foo f;
foo_init(&f);

This is a lot more boilerplate to handle though, so the C99 standard added designated initializers as a simpler way to handle this, where you specify named parameters as follows:

struct foo f = { .bar = "bar", .baz = NULL, .qux = "qux" };

Missing fields default to zero, so you can omit the .baz parameter:

struct foo f = { .bar = "bar", .qux = "qux" };

Duplicated fields are allowable, so other defaults can be handled by putting a macro in.

#define FOO_DEFAULTS .bar = "bar", .qux = "qux"
...
struct foo f = { FOO_DEFAULTS, .bar = "BAR" };

Compound literals

Now we have a handy way to initialize simple data structures, but we can still do better, as functions can take structs as parameters. Currently we have to define it and pass a reference:

int print_bar(FILE *out, struct foo *f)
{
    return fprintf(out, "%s\n", f->bar);
}
...
struct foo f = { FOO_DEFAULTS, .bar = "BAR" };
print_bar(stdout, &f);

This is a bit verbose, as constructing the object is trivial. Having a function that returns a reference to the object won't work either, as you would have to malloc(3) the structure, which would be a memory leak unless the function free(3)s the object itself.

struct foo *new_foo(char *bar, char *baz, char *qux)
{
    struct foo *f = malloc(sizeof(struct foo));
    foo_init(f, bar, baz, qux);
    return f;
}
print_bar(stdout, new_foo("bar", "baz", "qux")); /* memory leak */

In C++ you would have the function take the struct by reference, and pass a constructed object to it without using the new keyword.

C99 added Compound literals as a solution for this, which use similar syntax to the current initialisation, while still allocating the object on the stack, so you don't need to worry about using free(3) later.

print_bar(stdout, &((struct foo){FOO_DEFAULTS, .bar = "BAR"});

This looks similar to the initializaion syntax, but with the type of the expression prefixed with a cast to the right type.

Cleanup attribute

One of the important things about compound literals is that they are allocated on the stack, so you don't need to worry about having more code to clean it up again.

Resource allocation, however, is not always done on the stack. Normally you would require extra code to clean up your resources.

struct foo *f = new_foo("bar", NULL, "qux");
print_foo(stdout, f);
free(f);

With C++ you would use destructors and the RAII idiom to clean up these objects.

GCC has an extension mechanism called attributes, one of which is the Cleanup attribute.

void freep(void **m)
{
    if (*m) {
        free(*m);
        *m = NULL;
    }
}
#define cleanup_freep __attribute__((cleanup(freep)))
...
cleanup_freep struct foo *f = new_foo("bar", NULL, "qux");
print_foo(stdout, f);

This allows you to tag your variable declarations, such that a function is called when the function exits.

This is used to great effect in the systemd codebase to avoid boilerplate for resource handling.

Posted Wed Dec 31 12:00:08 2014