Iterators

Iterators, in themselves, aren't a revolutionary idea; many programming languages have them, but much of python's standard library is concerned with producing and consuming iterators.

iter

The iter function turns an iterable into an iterator. This is not normally required since functions that require an iterator also accept iterables.

iter can also be used to create an iterator from a function. For example, you can create an iterator that reads lines of input until the first empty line:

import sys
entries = iter(sys.stdin.readline, '\n')
for line in entries:
    sys.stdout.write(line)

Because of the magic of iterators, this will print lines out as they are entered, rather than after the first empty line.

itertools

There are many useful functions to manipulate iterators in the itertools library.

The ones I use most often are chain.from_iterable and product.

product

product takes any number of iterables, and returns tuples of every combination, in what is called by mathematicians the cartesian product.

What this means is that the following is functionally equivalent:

for x in [1, 2, 3]:
    for y in ['a', 'b', 'c']:
        print(x, y)

for x, y in itertools.product([1, 2, 3], ['a', 'b', 'c']):
    print(x, y)

Except the latter example needs less indentation, which makes it easier to keep to a maximum code width of 79 columns.

chain.from_iterable

chain takes iterators and returns an iterator that returns the values of each iterator in turn.

This can be used to merge dicts, as the dict constructor can take an iterator that returns pairs of key and value; but I've not found too many used for chain by itself.

This example implements a simple env(1) command using chain, takewhile and dropwhile though.

import itertools
import os
import subprocess
import sys
env_args = itertools.takewhile(lambda x: '=' in x, sys.argv[1:])
exec_args = itertools.dropwhile(lambda x: '=' in x, sys.argv[1:])
cmdline_env = []
for arg in env_args:
    key, value = arg.split('=', 1)
    cmdline_env.append((key, value))
new_env = itertools.chain(os.environ.iteritems(), cmdline_env)
args = list(exec_args)
os.execvpe(args[0], args, new_env)

chain has an alternative chain.from_iterable constructor, which takes an iterable of iterables. I find this useful when I have a set of objects that have an iterable field, and want to get the set of all those items.

import itertools
class Foo(object):
   def __init__(self, bars):
       self.bars = set(bars)
foos = set()
foos.add(Foo([1, 2, 3]))
foos.add(Foo([2, 3, 4]))
all_bars = set(itertools.chain.from_iterable(foo.bars for foo in foos))
assert all_bars == set([1, 2, 3, 4])

Generators

You may have noticed I passed something weird to the call to chain.from_iterable, this is called a generator expression.

Generators are a short-hand for creating certain kinds of iterator.

Indeed we could have used itertools.imap(lambda foo: foo.bars, foos), but as you can see, the generator expression syntax is shorter, and once you understand its general form, simpler.

You can do both filtering and mapping operations in generator expressions, so the following expressions are equivalent.

itertools.imap(transform, itertools.ifilter(condition, it))

(transform(x) for x in it if condition(x))

However, there's some calculations that aren't as easily expressed as a simple expression. To handle this, you can have generator functions.

generator functions

generator functions are a convenient syntax for creating generators from what looks like a normal function.

Rather than creating a container to keep the result of your calculation and returning that at the end, you can yield the individual values, and it will resume execution the next time you ask the iterator for the next value.

They are useful for calculations where the result is not simple, and may even be recursive.

def foo(bar):
    yield bar
    for baz in bar.qux:
        for x in foo(baz):
            yield x

Sub-generators

If you have python of version 3.3 or higher available, then you can use the yield from statement to delegate to sub-generators.

def foo(bar):
    yield bar
    for baz in bar.qux:
        yield from foo(baz)

In this example, rather than using yield from, you can do:

for x in foo(baz):
    yield x

However this is longer and doesn't handle the potential interesting corner cases where values can be passed into a generator function or returned when iteration ends.

Context managers

Context managers are used with the with statement. A context manager can be any object that defines the __enter__ and __exit__ methods.

You don't need a dedicated object to be a context manager, using the open file object as a context manager will have it close the file at the end of the with block. It is common to use open this way:

with open('foo.txt', 'w') as f:
    f.write('bar\n')

You can define the __enter__ and __exit__ methods yourself, but provided you don't need much logic at construction time (you rarely do) and you don't need it to be re-usable, you can define a context manager like:

import contextlib
import os
@contextlib.contextmanager
def chdir(path):
    pwd = os.getcwd()
    os.chdir(path)
    try:
        yield
    finally:
        os.chdir(pwd)

This uses a generator function that yields only one value (in this case we yield a None implicitly), and mostly exists so that you can run cleanup code after it has finished and re-enters your generator function.

The try...finally is necessary because when you yield in a context manager, it is resumed when the with block finishes, which can be from an exception. If it is from an exception then it is raised inside the context manager function, so to ensure that the chdir is always run, you need to wrap the yield in a try...finally block.