pages tagged optimisationyakkinghttp://yakking.branchable.com/tags/optimisation/yakkingikiwiki2015-12-30T12:00:13ZThe secret to writing fast python programshttp://yakking.branchable.com/posts/fast-python/Richard Maw2015-12-30T12:00:13Z2015-12-30T12:00:08Z
<p><a href="http://www.commitstrip.com/en/2014/08/07/our-cto-has-discovered-an-incredible-way-of-making-developers-read-his-commit-messages-you-wont-even-believe-how-he-did-it/">Clickbait</a> title aside,
I recently had a lot of fun with <a href="http://adventofcode.com/">Advent of Code</a>,
and since my weapon of choice is <a href="https://www.python.org/">Python</a>
I spent a lot of time thinking about how to make it run fast.</p>
<h1>Write less code</h1>
<p>Every line of code you write is a line that you intended to execute.</p>
<p>This is true in every programming language,
but more noticeable for interpreted languages like <a href="https://www.python.org/">Python</a>.</p>
<p>A "sufficiently smart compiler" may optimise away redundant code,
but it is mostly a myth,
and while there is an optimising compiler in the form of <a href="http://pypy.org/">PyPy</a>,
it requires you write in a restricted subset of <a href="https://www.python.org/">Python</a>.</p>
<p>Instead of writing your own classes and functions,
see if it already exists in the <a href="https://docs.python.org/2/library/">Python standard library</a>.</p>
<p>There is some overhead involved in calling functions over in-line code,
but for big operations it is faster to call outto a function written in C,
which most of the <a href="https://docs.python.org/2/library/">Python standard library</a> is.</p>
<h1>Use iterators and generators</h1>
<p>A lot of the <a href="https://docs.python.org/2/library/">Python standard library</a> functions can be given an iterable,
which means they can perform more operations in optimised C code,
without having to resume into interpreting Python code.</p>
<p>So if your code can be translated into plugging various generators together,
you can make it faster.</p>
<p>For example <a href="http://adventofcode.com/day/6">Advent of Code puzzle 6</a> was a problem involving a 1000x1000 grid
of lights that were either on or off.</p>
<p>This can be thought of as a set of coordinates of lights that are on.</p>
<p>The instructions are of the form
<code>(turn on|turn off|toggle) x0,y0 through x1,y1</code>.</p>
<h2>Generating coordinates</h2>
<p>If we parse that string and get a pair of coordinates,
we can turn that into coordinates of lights as follows:</p>
<pre><code>def lightsinrange(start, end):
xaxis = xrange(start[0], end[0]+1)
yaxis = xrange(start[1], end[1]+1)
lights = product(xaxis, yaxis)
return lights
</code></pre>
<p>This uses <a href="https://docs.python.org/2/library/functions.html#xrange">xrange</a> (<a href="https://docs.python.org/3/library/functions.html#func-range">range</a> in Python 3) to create an iterable
which yields every integer in the range.</p>
<p>This gets turned into every coordinate in both dimensions with <a href="https://docs.python.org/2/library/itertools.html#itertools.product">product</a>.</p>
<p>If you wanted to support arbitrary dimensions, then you'd instead do:</p>
<pre><code>def lightsinrange(start, end):
axis_ranges = (xrange(startcoord, endcoord+1)
for (startcoord, endcoord) in izip(start, end))
lights = product(*axis_ranges)
return lights
</code></pre>
<p>This uses <a href="https://docs.python.org/2/library/itertools.html#itertools.izip">izip</a> (just <a href="https://docs.python.org/2/library/itertools.html#itertools.product">zip</a> in Python 3)
to create ranges for each pair of coordinates per dimension,
and returns an iterable with a range of coordinates per dimension.</p>
<p><code>product(*axis_ranges)</code> uses the <code>*</code> operator,
which expands an iterable into function parameters.</p>
<h2>Toggling lights</h2>
<p>Given our <code>lightsinrange</code> function
and a <code>parse_operations</code> function left to the reader,
we can solve this problem as follows:</p>
<pre><code>lights = set()
def turn_on(iterable):
for light in iterable:
lights.add(light)
def turn_off(iterable):
for light in iterable:
if light in lights:
lights.remove(light)
def toggle(iterable):
for light in iterable:
if light in lights:
lights.remove(light)
else:
lights.add(light)
ops = {'turn on': turn_on, 'turn off': turn_off, 'toggle': toggle}
for op, start, end in parse_operations(stdin):
ops[op](lightsinrange(start, end))
</code></pre>
<p>Our implementations for <code>turn_on</code>, <code>turn_off</code> and <code>toggle</code>
jump into the implementation of the <a href="https://docs.python.org/2/library/stdtypes.html#set">set</a> type at least once per coordinate.</p>
<p>This is sub-optimal,
since it would be nice if we could just pass the iterable to the set.</p>
<p>Fortunately the <a href="https://docs.python.org/2/library/stdtypes.html#set">set</a> class has <a href="https://docs.python.org/2/library/stdtypes.html#set.update">update</a>, <a href="https://docs.python.org/2/library/stdtypes.html#set.difference_update">difference_update</a>
and <a href="https://docs.python.org/2/library/stdtypes.html#set.symmetric_difference_update">symmetric_difference_update</a>,
so our definitions of <code>turn_on</code>, <code>turn_off</code> and <code>toggle</code> can be simplified.</p>
<pre><code>def turn_on(iterable):
lights.update(iterable)
def turn_off(iterable):
lights.difference_update(iterable)
def toggle(iterable):
lights.symmetric_difference_update(iterable)
</code></pre>
<p>We're not done yet, since these functions just call a method on an object,
and <a href="https://docs.python.org/2/tutorial/classes.html#method-objects">Python's magic method binding</a> means we can use the methods themselves,
so we can throw away our functions and use bound methods.</p>
<pre><code>ops = {'turn on': lights.update, 'turn off': lights.difference_update,
'toggle': symmetric_difference_update}
</code></pre>
<p>So now, we've got a small amount of code
mostly throwing iterables at data structures.</p>
<h1>Iterate profiling and optimising</h1>
<p>If you've followed the previous two tips,
you should have a program that is written in a style
that works well given constraints of the platform.</p>
<p>This is the low-hanging fruit of making it faster,
since it doesn't require domain-specific knowledge of what the program is doing,
or the algorithms used to solve the problem.</p>
<p>Now you need to go into a profiling and optimising loop,
where you work out what the slowest part of the program is,
and work out how to make it faster.</p>
<p>You should be able to use Google to find plenty of profiling guides,
but in case your Google bubble is insufficiently developed to find them,
here are some links:</p>
<ul>
<li><a href="https://docs.python.org/2/library/profile.html">The Python Profilers</a></li>
<li><a href="http://lukauskas.co.uk/articles/2014/02/12/how-to-make-python-faster-without-trying-that-much/">How to make Python faster without trying that much</a></li>
<li><a href="http://meetings-archive.debian.net/pub/debian-meetings/2015/mini-debconf-cambridge/webm/making_obnam_faster.webm">Making Obnam faster</a> (a video by <a href="http://liw.fi/">Lars</a>)</li>
<li><a href="https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Profiling_Code">PythonSpeed/PerformanceTips</a></li>
</ul>
Using the Python profilerhttp://yakking.branchable.com/posts/python-profiling/Lars Wirzenius2015-11-04T12:00:13Z2015-11-04T12:00:07Z
<p>I've been doing a fair bit of optimisation lately, for a hobby project
of mine written in Python. This means I've been using the Python
profiler a lot. Remember: the word for "optimising without measuring"
is "pessimisation".</p>
<p>Here's quick tutorial on the very basics of using the Python profiler.</p>
<ol>
<li>Import the <code>cProfile</code> module</li>
<li>Invoke the main function of your program using <code>cProfile.run()</code>.</li>
</ol>
<p>As an example:</p>
<pre><code>import cProfile
class SomeClass(object):
def run(self):
self.total = 0
for i in xrange(10**6):
self.add(1)
return self.total
def numbers(self):
return xrange(10**6)
def add(self, amount):
self.total += amount
def main():
obj = SomeClass()
print obj.run()
if False:
# Run normally.
main()
else:
# Run under profile, writing profile data to prof.prof.
cProfile.run('main()', filename='prof.prof')
</code></pre>
<p>If you run the above, the profiler will write a file <code>prof.prof</code> with
the profiling data. That data is in a binary form. The following code
will display it in text form:</p>
<pre><code>import pstats
import sys
p = pstats.Stats(sys.argv[1])
p.strip_dirs()
p.sort_stats('cumulative')
p.print_stats()
print '-' * 78
p.print_callees()
</code></pre>
<p>The <a href="https://docs.python.org/2/library/profile.html">profiler documentation</a> has details to show what the different
calls do. I'd like to concentrate on the output. The output from the
above actually contains the profiling data in two forms:</p>
<ul>
<li>list of how much time is spent in each function or method</li>
<li>a breakdown for each function/method of how much time is spent
in each thing it calls</li>
</ul>
<p>Both are useful. The latter is especially useful.</p>
<p>Example output:</p>
<pre><code>Fri Oct 30 11:36:05 2015 prof.prof
1000004 function calls in 0.302 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.302 0.302 <string>:1(<module>)
1 0.000 0.000 0.302 0.302 prof.py:19(main)
1 0.175 0.175 0.302 0.302 prof.py:6(run)
1000000 0.127 0.000 0.127 0.000 prof.py:15(add)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
------------------------------------------------------------------------------
Ordered by: cumulative time
Function called...
ncalls tottime cumtime
<string>:1(<module>) -> 1 0.000 0.302 prof.py:19(main)
prof.py:19(main) -> 1 0.175 0.302 prof.py:6(run)
prof.py:6(run) -> 1000000 0.127 0.127 prof.py:15(add)
prof.py:15(add) ->
{method 'disable' of '_lsprof.Profiler' objects} ->
</code></pre>
<p>From the first part, we see that the all cumulative time of the
program is spent in the <code>run</code> method. From the second we see that its
time is partially spent in the <code>add</code> method. The rest of the time is
spent in <code>run</code> itself.</p>
<p>This is a trivial example, but it shows the principle. By starting
from this, you can start drilling into where your program spends its
time.</p>