Richard Maw D-Bus

D-Bus is an IPC mechanism designed for applications and services, with the aim of providing:

  1. Reliability of message delivery.
  2. Globally consistent message ordering.
  3. Providing broadcast notifications of changes.
  4. If you're using kdbus, trust of the identity of the message sender.

D-Bus is not designed for:

  1. Bulk transport of data:

    For that you may want to use a pipe or regular file descriptor.

  2. Turning Linux into a microkernel.

    Microkernels are known for having very fast IPC, and do every operation by sending a message to some other component.

    It is tempting to build microkernel semantics on top of Linux by having services react to messages and perform the required actions, rather than using system calls.

    The overhead involved in using D-Bus is too great for this.

At its core, D-Bus is a message bus, with subscription semantics so you get event notifications, and a serialisation/marshalling mechanism to provide a type system for messages.

Built on top of this is an object model, where applications sit in an event loop, responding to messages to call methods or read properties of virtual objects, in the object oriented sense.

Application frameworks often build on top of this to provide proxy objects, so that you can effectively refer to objects in different processes.

By providing a well known name where a D-Bus object may be found, you can have local services provide an API for performing operations, and local applications instruct these services to perform operations.

A more detailed description of what D-Bus is and its concepts, can be found here. The article is about sd-bus, but the "What is D-Bus again?" and "Introduction to D-Bus Concepts" sections are relevant to understanding D-Bus.

Posted Wed Dec 2 12:00:07 2015 Tags:
Daniel Silverstone Starting with scripting

Scripting languages are found all over your Linux system. From Python to Shell, from Ruby to Haskell, there are so many flavours of scripting language that to choose just one would be the height of boring distaste. Over the coming few articles, I shall endeavour to show you the basics of a number of scripting languages, but in order to keep things short and sweet, I shall limit myself to three languages - that of Lua, Perl, and Python.

Scripting languages differ from compiled languages in a variety of ways which we have discussed before, but the critical aspect we're going to discuss today is called a REPL.

A REPL (Read-Evaluate-Print-Loop) is essentially exactly what the shell is. It is a style of user interface where a program prompts for input, reads that in, evaluates it there and then, prints the result out, and then loops back to prompting for new input. Each of Lua, and Python have REPLs; where Perl simply reads a program from standard input when you run the language interpreter with no other arguments.

Lua's REPL looks like this:

Lua 5.3.1  Copyright (C) 1994-2015 Lua.org, PUC-Rio
> 

Python's more like this:

Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

However, since Perl has all the tools known to man, if you install Devel::REPL (the libdevel-repl-perl package on Debian and derivatives) you can simply run re.pl and get the following very perly prompt:

$_

In all three cases, pressing Control+D will exit the repl (providing you enter that on an otherwise blank input line), but you can also use (for Lua):

> os.exit(0)

for Python:

>>> import sys
>>> sys.exit(0)

for Perl:

$_ exit 0

Now, despite their various roots, scripting languages often share some very simple syntax between them. Though depending on the origin of the language the specifics might vary. For today we're just going to look at the simplest of commands -- writing some output to stdout.

For Lua, there is a function called print which takes any number of arguments and writes them to stdout with a newline at the end:

> print("Hello World", 1+5)
Hello World     6
>

Lua's print function also adds tabs between the values you give it. In Python 2, print is a statement not a function, though that changes with Python 3. Despite this, Python also adds a newline:

>>> print "Hello World", 1+5
Hello World 6
>>>

As you can see, Python added a single space rather than a tab. Perl's a little more text-oriented by default though, and its print statement does not append a newline by default, nor does it insert any spaces or tabs, so we need to do:

$_ print "Hello World ", 1+5, "\n"
HelloWorld6
$_

For this week, I ask that you pick a scripting langauge, either one of the above or any other you fancy, and spend some time getting used to the REPL for it. Practice using the print statement (or equivalent in your chosen language) with strings and simple expressions. Get comfortable with it and next time we'll look at some basic data storage.

Posted Wed Dec 9 16:25:04 2015

It is frequently asserted that "the UNIX philosophy" is to build complexity out of simpler, reusable parts.

This has resulted in a rich toolbox to build shell scripts out of.

There is not always an available command to use in a shell script, so you might need to write something yourself to fill the gap; or you might have written a program that does too many things, and you want to split it up into independenly reusable parts.

So there's a few recommendations for different styles of programs.

Data processing programs

This means programs like grep, sort and cut.

Programs should be able to take multiple records of input, and produce multiple records of output.

They should read their input records from the standard input stream, and write it to the standard output stream, so that it can be put into shell pipelines.

Context managing programs

This means programs like chroot, flock, sudo, systemd-inhibit and unshare.

These do some operation, then run a follow-up command in the new context. With chroot the new command is run, rooted in a different subtree; with flock the new command is run with a lock on a file taken; with sudo the new command is run with a different user; with systemd-inhibit the new command is run with an inhibitor lock taken; and with unshare the new command is run with a different namespace.

As continuations

These programs should take a list of arguments to run instead of a shell command so that they don't require the command to be evaluated as a shell command, since that requires the program to run the shell, is extra effort to secure the command to prevent command injection, and can be turned into a shell command by running sh -c "$COMMAND" -.

If your command needs to provide the subcommand with more context then having the entirity of the remaining arguments be the command makes it difficult to include this context as an argument.

These commands should clean up any resources provided as the context when the subcommand exits.

For chroot, flock, sudo and unshare the context is tied to the subprocess' lifetime so this is easy.

With other cleanup

Avoid programs that need you to provide a continuation command where possible, since it is not easy to distinguish where the command may have failed.

You can instead clean up with a trap, such as the following command which has a temporary directory.

td="$(mktemp -d)"
trap 'rm -rf "$td"' EXIT
some_command "$td"

Since flock's context is tied to a file descriptor, which may be inherited to subprocess, instead of using the continuation form, you can tell it which file descriptor to operate on instead.

(
    flock 100
    some_command
) 100<lockfile

Passing context to subcommands

For commands that are both bound to the process' context and need to inform the subprocess of what it is, you can tell it where to write the information to, such as an environment variable or file.

I like telling a command to write information to a FIFO (as I do with ephemeral-launch) since it also allows for some program synchronisation, as it blocks until you read the port it bound to out of the FIFO.

inetd prefers to include the address of the port that was bound in the TCPLOCALPORT environment variable.

Posted Wed Dec 16 12:00:08 2015
Lars Wirzenius Sort out deployment first

It is tempting to start a new project with the interesting bits, but it's often a mistake. One of the first steps in a new project should be to sort out deployment: getting the software installed and configured so it can be used. This is particularly important if it's not a simple command line utility, but something that connects to databases, web servers, or requires authentication or other sensitive operations.

The following has never, ever happened to anyone, but if you're not careful, it might happen to you.

It's Friday, last day before your team goes on summer holidays. There's only you left at the office, everyone has already left. You can leave too, as soon as you've deployed the web service your team has developed to a server, so that the customer can test it during your holidays.

You add some Debian packaging. You build the package. You install the package on the server. Everything's fine, except the service doesn't actually work.

It turns out that you and your team have been using the built-in HTTP server in bottle.py, which is meant for debugging and development only, not for production deployment. Even if you were happy to just use that, since it's not in real production yet, your software has no way to configure the HTTP server to listen on anything but 127.0.0.1:8080. You need, at minimum, to add a configuration file, and oh yeah, TLS support would be nice, and, um, there needs to be a way to include the configuration in the package, and um. You really don't want to run the debug server, so you need to add integration to a WSGI server. After all, you've done it before, how hard can it be?

Your spouse calls you and wants to know when you should be picked up to leave for the trip you're starting that day. You should leave early to avoid rush hour traffic.

You're in a hurry, you don't have time to do things right. You log into the server, and hack up code in-place. You are in a hurry, so it takes you twice as long as it should, but after a couple of hours you're done.

You leave. You go on your trip, and have a lovely holiday. You come back, and new disasters take up your time.

A few months later, someone asks if you can fix the service, since it seems it doesn't work anymore. You say, "um".

Don't go there. Get your deployment done at the beginning, and use it throughout your project, and you'll be able to rely on it when it's time to deploy for real.

Posted Wed Dec 23 12:00:08 2015 Tags:

Clickbait title aside, I recently had a lot of fun with Advent of Code, and since my weapon of choice is Python I spent a lot of time thinking about how to make it run fast.

Write less code

Every line of code you write is a line that you intended to execute.

This is true in every programming language, but more noticeable for interpreted languages like Python.

A "sufficiently smart compiler" may optimise away redundant code, but it is mostly a myth, and while there is an optimising compiler in the form of PyPy, it requires you write in a restricted subset of Python.

Instead of writing your own classes and functions, see if it already exists in the Python standard library.

There is some overhead involved in calling functions over in-line code, but for big operations it is faster to call outto a function written in C, which most of the Python standard library is.

Use iterators and generators

A lot of the Python standard library functions can be given an iterable, which means they can perform more operations in optimised C code, without having to resume into interpreting Python code.

So if your code can be translated into plugging various generators together, you can make it faster.

For example Advent of Code puzzle 6 was a problem involving a 1000x1000 grid of lights that were either on or off.

This can be thought of as a set of coordinates of lights that are on.

The instructions are of the form (turn on|turn off|toggle) x0,y0 through x1,y1.

Generating coordinates

If we parse that string and get a pair of coordinates, we can turn that into coordinates of lights as follows:

def lightsinrange(start, end):
    xaxis = xrange(start[0], end[0]+1)
    yaxis = xrange(start[1], end[1]+1)
    lights = product(xaxis, yaxis)
    return lights

This uses xrange (range in Python 3) to create an iterable which yields every integer in the range.

This gets turned into every coordinate in both dimensions with product.

If you wanted to support arbitrary dimensions, then you'd instead do:

def lightsinrange(start, end):
    axis_ranges = (xrange(startcoord, endcoord+1)
                   for (startcoord, endcoord) in izip(start, end))
    lights = product(*axis_ranges)
    return lights

This uses izip (just zip in Python 3) to create ranges for each pair of coordinates per dimension, and returns an iterable with a range of coordinates per dimension.

product(*axis_ranges) uses the * operator, which expands an iterable into function parameters.

Toggling lights

Given our lightsinrange function and a parse_operations function left to the reader, we can solve this problem as follows:

lights = set()
def turn_on(iterable):
    for light in iterable:
        lights.add(light)
def turn_off(iterable):
    for light in iterable:
        if light in lights:
            lights.remove(light)
def toggle(iterable):
    for light in iterable:
        if light in lights:
            lights.remove(light)
        else:
            lights.add(light)
ops = {'turn on': turn_on, 'turn off': turn_off, 'toggle': toggle}
for op, start, end in parse_operations(stdin):
    ops[op](lightsinrange(start, end))

Our implementations for turn_on, turn_off and toggle jump into the implementation of the set type at least once per coordinate.

This is sub-optimal, since it would be nice if we could just pass the iterable to the set.

Fortunately the set class has update, difference_update and symmetric_difference_update, so our definitions of turn_on, turn_off and toggle can be simplified.

def turn_on(iterable):
    lights.update(iterable)
def turn_off(iterable):
    lights.difference_update(iterable)
def toggle(iterable):
    lights.symmetric_difference_update(iterable)

We're not done yet, since these functions just call a method on an object, and Python's magic method binding means we can use the methods themselves, so we can throw away our functions and use bound methods.

ops = {'turn on': lights.update, 'turn off': lights.difference_update,
       'toggle': symmetric_difference_update}

So now, we've got a small amount of code mostly throwing iterables at data structures.

Iterate profiling and optimising

If you've followed the previous two tips, you should have a program that is written in a style that works well given constraints of the platform.

This is the low-hanging fruit of making it faster, since it doesn't require domain-specific knowledge of what the program is doing, or the algorithms used to solve the problem.

Now you need to go into a profiling and optimising loop, where you work out what the slowest part of the program is, and work out how to make it faster.

You should be able to use Google to find plenty of profiling guides, but in case your Google bubble is insufficiently developed to find them, here are some links:

Posted Wed Dec 30 12:00:08 2015 Tags: