Lars Wirzenius Using the Python profiler

I've been doing a fair bit of optimisation lately, for a hobby project of mine written in Python. This means I've been using the Python profiler a lot. Remember: the word for "optimising without measuring" is "pessimisation".

Here's quick tutorial on the very basics of using the Python profiler.

  1. Import the cProfile module
  2. Invoke the main function of your program using cProfile.run().

As an example:

import cProfile

class SomeClass(object):

    def run(self):
        self.total = 0
        for i in xrange(10**6):
            self.add(1)
        return self.total

    def numbers(self):
        return xrange(10**6)

    def add(self, amount):
        self.total += amount

def main():
    obj = SomeClass()
    print obj.run()

if False:
    # Run normally.
    main()
else:
    # Run under profile, writing profile data to prof.prof.
    cProfile.run('main()', filename='prof.prof')

If you run the above, the profiler will write a file prof.prof with the profiling data. That data is in a binary form. The following code will display it in text form:

import pstats
import sys

p = pstats.Stats(sys.argv[1])
p.strip_dirs()
p.sort_stats('cumulative')
p.print_stats()
print '-' * 78
p.print_callees()

The profiler documentation has details to show what the different calls do. I'd like to concentrate on the output. The output from the above actually contains the profiling data in two forms:

  • list of how much time is spent in each function or method
  • a breakdown for each function/method of how much time is spent in each thing it calls

Both are useful. The latter is especially useful.

Example output:

Fri Oct 30 11:36:05 2015    prof.prof

         1000004 function calls in 0.302 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.302    0.302 <string>:1(<module>)
        1    0.000    0.000    0.302    0.302 prof.py:19(main)
        1    0.175    0.175    0.302    0.302 prof.py:6(run)
  1000000    0.127    0.000    0.127    0.000 prof.py:15(add)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


------------------------------------------------------------------------------
   Ordered by: cumulative time

Function                                          called...
                                                      ncalls  tottime  cumtime
<string>:1(<module>)                              ->       1    0.000    0.302  prof.py:19(main)
prof.py:19(main)                                  ->       1    0.175    0.302  prof.py:6(run)
prof.py:6(run)                                    -> 1000000    0.127    0.127  prof.py:15(add)
prof.py:15(add)                                   ->
{method 'disable' of '_lsprof.Profiler' objects}  ->

From the first part, we see that the all cumulative time of the program is spent in the run method. From the second we see that its time is partially spent in the add method. The rest of the time is spent in run itself.

This is a trivial example, but it shows the principle. By starting from this, you can start drilling into where your program spends its time.

Posted Wed Nov 4 12:00:07 2015 Tags:
Richard Maw Networking - DHCP

I hope you've done your homework. In a previous article about networking we set an exercise to configure some machines with static IP addresses.

You probably agree that it would be annoying if you had to change the address range as it is configured in all the different computers.

Wouldn't it be better if we could store this information in the network somehow?

Well, guess what? We can! Its called DHCP.

This works by nominating one computer on the network to act as the master of networking configuration.

Computers which want to opt into the network communicate with this server to be told what addresses to use.

Normally your router-modem that connects you to the internet takes this responsibility, but any machine, real or virtual, may do it.

Example with network namespaces

As before, we're going to use network namespaces to simulate multiple machines sharing a network.

For DHCP to work you need a DHCP server on the managing machine, and a DHCP client on every client machine that will join this network. Since we are only virtualising the networking, we need both on the same machine.

You will likely have dnsmasq and dhclient on your system, since while systemd can provide a DHCP server and client through networkd, this currently is generally only used in servers, embedded and containers, while NetworkManager uses dnsmasq and dhclient on desktops.

While networkd is capable of being both a DHCP client and server, we can only demonstrate its use as a client, since it requires a full container to isolate its configuration from the host.

Configuring the virtual network

We're going to use a network namespace called dhcptest, which will be connected to the host network namespace by a virtual ethernet device.

# ip netns add dhcptest
# ip link add host type veth peer name guest
# ip link set guest netns dhcptest

The subnet of the network linking us to dhcptest we use is 10.0.0.0/24, which ranges from 10.0.0.1 to 10.0.0.255.

Since the machine in dhcptest will be the DHCP server, we need to assign its own address statically.

# ip netns exec dhcptest ip addr add 10.0.0.1 dev guest
# ip netns exec dhcptest ip link set guest up
# ip netns exec dhcptest ip route add 10.0.0.0/24 dev guest

Using dnsmasq as the DHCP server

In a terminal, run:

# ip netns exec dhcptest dnsmasq --dhcp-range=10.0.0.2,10.0.0.254,255.255.255.0 --interface=guest --no-daemon
dnsmasq: started, version 2.72 cachesize 150
dnsmasq: compile time options: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect
dnsmasq-dhcp: DHCP, IP range 10.0.0.2 -- 10.0.0.254, lease time 1h
dnsmasq: reading /etc/resolv.conf
dnsmasq: using nameserver 127.0.1.1#53
dnsmasq: read /etc/hosts - 1 addresses

This starts dnsmasq on dhcptest, serving the address range 10.0.0.2 to 10.0.0.254, with the subnet 10.0.0.0/24 (this is the 255.255.255.0 netmask), on theguest` network interface.

--no-daemon forces dnsmasq to stay running in the foreground, which makes it easier to clean up when finished, since it can be cancelled with ^C.

Using dhclient as the DHCP client

In a different terminal to the one that the DHCP server was started in, run:

# dhclient -d host
Internet Systems Consortium DHCP Client 4.3.1
Copyright 2004-2014 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/host/16:c0:28:6b:ea:76
Sending on   LPF/host/16:c0:28:6b:ea:76
Sending on   Socket/fallback
DHCPDISCOVER on host to 255.255.255.255 port 67 interval 3 (xid=0xd6a3b68)
DHCPREQUEST of 10.0.0.105 on host to 255.255.255.255 port 67 (xid=0x683b6a0d)
DHCPOFFER of 10.0.0.105 from 10.0.0.1
DHCPACK of 10.0.0.105 from 10.0.0.1
bound to 10.0.0.105 -- renewal in 1643 seconds.

This shows that it has successfully requested 10.0.0.105 as its address.

The dhclient process will continue to run, as it hasn't been given that address forever, it has been given that address for an hour, at which point it will expire and it must stop using it, but before then dhclient may renew the address to allow us to keep using it.

Using networkd as the DHCP client

First create the config file:

# install -m 644 /dev/stdin /etc/systemd/network/dhcptest.network <<'EOF'
> [Match]
> Name=host
> [Network]
> DHCP=yes
> EOF

Since networkd currently only reads its config at startup, we must restart it for the new config to take effect. (This also has the effect of starting it for the first time, on distros that don't start networkd by default.)

# systemctl restart systemd-networkd.service

This won't give any immediate feedback whether it worked, but you can inspect its status by running:

$ networkctl status host
‚óŹ 6: host
   Link File: /lib/systemd/network/99-default.link
Network File: /etc/systemd/network/dhcptest.network
        Type: ether
       State: routable (configured)
      Driver: veth
  HW Address: 16:c0:28:6b:ea:76
         MTU: 1500
     Address: 10.0.0.105
              fe80::14c0:28ff:fe6b:ea76
     Gateway: 10.0.0.1
         DNS: 10.0.0.1

Note that this has created some persistent configuration, so the next time the host ethernet device is created networkd will DHCP.

If you don't want to do this in future, you can remove the configuration by running rm /etc/systemd/network/dhcptest.network.

DHCPing successfully

In the terminal running dnsmasq you should see:

dnsmasq-dhcp: DHCPDISCOVER(guest) 10.0.0.105 16:c0:28:6b:ea:76 
dnsmasq-dhcp: DHCPOFFER(guest) 10.0.0.105 16:c0:28:6b:ea:76 
dnsmasq-dhcp: DHCPREQUEST(guest) 10.0.0.105 16:c0:28:6b:ea:76 
dnsmasq-dhcp: DHCPACK(guest) 10.0.0.105 16:c0:28:6b:ea:76 HOSTNAME
dnsmasq-dhcp: not giving name HOSTNAME to the DHCP lease of 10.0.0.105 because the name exists in /etc/hosts with address 127.0.1.1

In another terminal you can see that it configured the address by running:

$ ip addr show dev host
6: host: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 16:c0:28:6b:ea:76 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.105/24 brd 10.0.0.255 scope global host
       valid_lft forever preferred_lft forever
    inet6 fe80::14c0:28ff:fe6b:ea76/64 scope link 
       valid_lft forever preferred_lft forever

Testing the link

We can now prove it works with netcat.

Run ip netns exec dhcptest nc -l 1234 in one terminal, and nc 10.0.0.1 1234 in another, and you will be able to see that text entered is repeated in both terminals.

Posted Wed Nov 11 12:00:08 2015 Tags:
Daniel Silverstone Domain specific languages

A domain specific language (DSL) is, in brief, any computer language which has been especially targetted at a given task or task domain. For example a configuration language for firewalls, or indeed postscript. Typically DSLs are small languages, though they can be turing complete in some sense, even given their narrow purpose.

DSLs are all over the average Linux system, indeed many of the files in /etc are written in small configuration DSLs. A DSL might be implemented in an entirely custom fashion using lexers and parsers specific to the software hosting the DSL; or they might be a set of semantics and functions built on top of an existing language such as the shell. DSLs don't need to be a programming language of sorts, they can be purely declarative, such as INI files which are becoming more and more popular. (For example, systemd builds a number of related DSLs for its units on top of the INI format.)

If you're building a more-than-trivial tool, you'll likely encounter a desire to configure it in a file at one point. At that point, you too might encounter the desire to implement a DSL of some kind. When you reach that point, I urge you to look for an existing configuration language of some kind and to build from there. systemd's use of the INI format as a basis for its DSLs is in no way an accident. By building on formats already understood, you'll make it much easier for others to work with your configuration files.

Posted Wed Nov 18 12:00:07 2015 Tags:

I spoke at the first systemd.conf recently.

This was an enjoyable experience, and I can highly recommend attending relevant conferences.

The value in conferences

  1. Put names to faces of people you've never met in person before.

    This adds a personal touch that makes future communication easier, since you get a feel for how approachable someone is, and they know you as someone they can talk to.

    Some people's avatars look rather different to how they do in-person, and now I have a good idea of the people who I had been chatting to in #systemd actually are.

    This is easier if the conference has social events arranged, as they often include alcohol, which can loosen the tongue sufficiently to get past the inhibitions and awkwardness of speaking to people you don't know so well.

  2. Participate in the less open discussions that interested subgroups have.

    Some large projects organise themselves into subgroups, aligned partially by interests, and partially by the organisational structures of where they work.

    The kdbus developers don't normally hold the discussions of its development anywhere I frequent, but I maintain an interest in its development, so being able to hear how it was going was nice.

  3. Discover common ground between different developers, to organise subgroups.

    Systemd is normally consumed by downstream Linux distributions, rather than the users directly, who instead get it from their distribution, and there is an amount of work involved.

    Because systemd.conf allowed all the maintainers from the distributions to get together in one room, they were able to organise a common working group, so that they could collaborate on downstream packaging, most importantly communal maintanence of a stable release.

  4. Hear the gossip that you don't normally hear.

    I couldn't possibly comment.

  5. Attend or give talks.

    Attending talks gives you the opportunity to understand the perspectives of others, in a manner structured to help you understand.

    Giving talks is both the opportunity to share your perspective, and if, like me, you are uncomfortable with public speaking, the opportunity to step outside your comfort zone and expand it.

Getting involved in conferences

If your project is small, it is possible to organise a dev-room, in a community-driven conference, like FOSDEM, or a larger corporate conference where interests overlap, like systemd used to.

Medium-sized projects, like systemd, can arrange to have their own dedicated conference.

Large projects, like the Linux kernel or OpenStack have many, big, corporate conferences, in far away, expensive places, a few times a year.

There is usually some form of sponsorship available, for starving hackers to be able to attend conferences.

Big projects often have more expensive conferences, but usually balance this by having sponsorship more readily available.

Small projects may find sponsorship more difficult, so attendees will need to fund it themselves, which makes community conferences like FOSDEM more appropriate.

Call to action

If you're looking to attend a conference for some project that interests you, but you don't have a specific project in mind, you could look at the LWN.net Community Calendar.

If you are a member of a project and would like to get together with other developers or users, and there isn't a conference of its own for it, try to find a community conference that is near to most of the developers, and see if you can arrange for sponsorship for any more remote developers, so that you can have a get-together.

Posted Wed Nov 25 12:00:08 2015 Tags: