Chapter 4. Python Shortcuts

Introduction

Credit: David Ascher, ActiveState, co-author of Learning Python

Programming languages are like natural languages. Each has a set of qualities that polyglots generally agree on as characteristics of the language. Russian and French are often admired for their lyricism, while English is more often cited for its precision and dynamism: unlike the Académie-defined French language, the English language routinely grows words to suit its speakers’ needs, such as “carjacking,” “earwitness,” “snailmail,” “email,” “googlewhacking,” and “blogging.” In the world of computer languages, Perl is well known for its many degrees of freedom: TMTOWTDI (There’s More Than One Way To Do It) is one of the mantras of the Perl programmer. Conciseness is also seen as a strong virtue in the Perl and APL communities. As you’ll see in many of the discussions of recipes throughout this volume, in contrast, Python programmers often express their belief in the value of clarity and elegance. As a well-known Perl hacker once told me, Python’s prettier, but Perl is more fun. I agree with him that Python does have a strong (as in well-defined) aesthetic, while Perl has more of a sense of humor.

The reason I mention these seemingly irrelevant characteristics at the beginning of this chapter is that the recipes you see in this chapter are directly related to Python’s aesthetic and social dynamics. If this book had been about Perl, the recipes in a shortcuts chapter would probably elicit head scratching, contemplation, an “a-ha”! moment, and then a burst of laughter, as the reader grokked the genius behind a particular trick. In contrast, in most of the recipes in this chapter, the author presents a single elegant language feature, but one that he feels is underappreciated. Much like I, a proud resident of Vancouver, will go out of my way to show tourists the really neat things about the city, from the parks to the beaches to the mountains, a Python user will seek out friends and colleagues and say, “You gotta see this!” For me and most of the programmers I know, programming in Python is a shared social pleasure, not a competitive pursuit. There is great pleasure in learning a new feature and appreciating its design, elegance, and judicious use, and there’s a twin pleasure in teaching another or another thousand about that feature.

A word about the history of the chapter: back when we identified the recipe categories for the first edition of this collection, our driving notion was that there would be recipes of various kinds, each with a specific goal—a soufflé, a tart, an osso buco. Those recipes would naturally fall into fairly typical categories, such as desserts, appetizers, and meat dishes, or their perhaps less appetizing, nonmetaphorical equivalents, such as files, algorithms, and so on. So we picked a list of categories, added the categories to the Zope site used to collect recipes, and opened the floodgates.

Soon, it became clear that some submissions were hard to fit into the predetermined categories. There’s a reason for that, and cooking helps explain why. The recipes in this chapter are the Pythonic equivalent of making a roux (a cooked mixture of fat and flour, used in making sauces, for those of you without a classic French cooking background), kneading dough, flouring, separating eggs, flipping a pan’s contents, blanching, and the myriad other tricks that any accomplished cook knows, but that you won’t find in a typical cookbook. Many of these tricks and techniques are used in preparing meals, but it’s hard to pigeonhole them as relevant for a given type of dish. And if you’re a novice cook looking up a fancy recipe, you’re likely to get frustrated quickly because serious cookbook authors assume you know these techniques, and they explain them (with illustrations!) only in books with titles such as Cooking for Divorced Middle-Aged Men. We didn’t want to exclude this precious category of tricks from this book, so a new category was born (sorry, no illustrations).

In the introduction to this chapter in the first edition, I presciently said:

I believe that the recipes in this chapter are among the most time-sensitive of the recipes in this volume. That’s because the aspects of the language that people consider shortcuts or noteworthy techniques seem to be relatively straightforward, idiomatic applications of recent language features.

I can proudly say that I was right. This new edition, significantly focused on the present definition of the language, makes many of the original recipes irrelevant. In the two Python releases since the book’s first edition, Python 2.3 and 2.4, the language has evolved to incorporate the ideas of those recipes into new syntactic features or library functions, just as it had done with every previous major release, making a cleaner, more compact, and yet more powerful language that’s as much fun to use today as it was over ten years ago.

All in all, about half the recipes in this chapter (roughly the same proportion as in the rest of the book) are entirely new ones, while the other half are vastly revised (mostly simplified) versions of recipes that were in the first edition. Thanks to the simplifications, and to the focus on just two language versions (2.3 and 2.4) rather than the whole panoply of older versions that was covered by the first edition, this chapter, as well as the book as a whole, has over one-third more recipes than the first edition did.

It’s worth noting in closing that many of the recipes that are in this newly revised chapter touch on some of the most fundamental, unchanging aspects of the language: the semantics of assignment, binding, copy, and references; sequences; dictionaries. These ideas are all keys to the Pythonic approach to programming, and seeing these recipes live for several years makes me wonder whether Python will evolve in the next few years in related directions.

4.1. Copying an Object

Credit: Anna Martelli Ravenscroft, Peter Cogolo

Problem

You want to copy an object. However, when you assign an object, pass it as an argument, or return it as a result, Python uses a reference to the original object, without making a copy.

Solution

Module copy in the standard Python library offers two functions to create copies. The one you should generally use is the function named copy, which returns a new object containing exactly the same items and attributes as the object you’re copying:

import copy
new_list = copy.copy(existing_list)

On the rare occasions when you also want every item and attribute in the object to be separately copied, recursively, use deepcopy:

import copy
new_list_of_dicts = copy.deepcopy(existing_list_of_dicts)

Discussion

When you assign an object (or pass it as an argument, or return it as a result), Python (like Java) uses a reference to the original object, not a copy. Some other programming languages make copies every time you assign something. Python never makes copies “implicitly” just because you’re assigning: to get a copy, you must specifically request a copy.

Python’s behavior is simple, fast, and uniform. However, if you do need a copy and do not ask for one, you may have problems. For example:

>>> a = [1, 2, 3]
>>> b = a
>>> b.append(5)
>>> print a, b[1, 2, 3, 5] [1, 2, 3, 5]

Here, the names a and b both refer to the same object (a list), so once we alter the object through one of these names, we later see the altered object no matter which name we use for it. No original, unaltered copy is left lying about anywhere.

Warning

To become an effective Python programmer, it is crucial that you learn to draw the distinction between altering an object and assigning to a name, which previously happened to refer to the object. These two kinds of operations have nothing to do with each other. A statement such as a=[ ] rebinds name a but performs no alteration at all on the object that was previously bound to name a. Therefore, the issue of references versus copies just doesn’t arise in this case: the issue is meaningful only when you alter some object.

If you are about to alter an object, but you want to keep the original object unaltered, you must make a copy. As this recipe’s solution explains, the module copy from the Python Standard Library offers two functions to make copies. Normally, you use copy.copy, which makes a shallow copy—it copies an object, but for each attribute or item of the object, it continues to share references, which is faster and saves memory.

Shallow copying, alas, isn’t sufficient to entirely “decouple” a copied object from the original one, if you propose to alter the items or attributes of either object, not just the object itself:

>>> list_of_lists = [ ['a'], [1, 2], ['z', 23] ]
>>> copy_lol = copy.copy(lists_of_lists)
>>> copy_lol[1].append('boo')
>>> print list_of_lists, copy_lol[['a'], [1, 2, 'boo'], ['z', 23]] [['a'], [1, 2, 'boo'], ['z', 23]]

Here, the names list_of_lists and copy_lol refer to distinct objects (two lists), so we could alter either of them without affecting the other. However, each item of list_of_lists is the same object as the corresponding item of copy_lol, so once we alter an item reached by indexing either of these names, we later see the altered item no matter which object we’re indexing to reach it.

If you do need to copy some container object and also recursively copy all objects it refers to (meaning all items, all attributes, and also items of items, items of attributes, etc.), use copy.deepcopy—such deep copying may cost you substantial amounts of time and memory, but if you gotta, you gotta. For deep copies, copy.deepcopy is the only way to go.

For normal shallow copies, you may have good alternatives to copy.copy, if you know the type of the object you want to copy. To copy a list L, call list( L ); to copy a dict d, call dict(d); to copy a set s (in Python 2.4, which introduces the built-in type set), call set(s). (Since list, dict, and, in 2.4, set, are built-in names, you do not need to perform any “preparation” before you use any of them.) You get the general pattern: to copy a copyable object o, which belongs to some built-in Python type t, you may generally just call t(o). dicts also offer a dedicated method to perform a shallow copy: d.copy( ) and dict(d) do the same thing. Of the two, I suggest you use dict(d): it’s more uniform with respect to other types, and it’s even shorter by one character!

To copy instances of arbitrary types or classes, whether you coded them or got them from a library, just use copy.copy. If you code your own classes, it’s generally not worth the bother to define your own copy or clone method. If you want to customize the way instances of your class get (shallowly) copied, your class can supply a special method _ _copy_ _ (see Recipe 6.9 for a special technique relating to the implementation of such a method), or special methods _ _getstate_ _ and _ _setstate_ _. (See Recipe 7.4 for notes on these special methods, which also help with deep copying and serialization—i.e., pickling—of instances of your class.) If you want to customize the way instances of your class get deeply copied, your class can supply a special method _ _deepcopy_ _ (see Recipe 6.9.)

Note that you do not need to copy immutable objects (strings, numbers, tuples, etc.) because you don’t have to worry about altering them. If you do try to perform such a copy, you’ll just get the original right back; no harm done, but it’s a waste of time and code. For example:

>>> s = 'cat'
>>> t = copy.copy(s)
>>> s is tTrue

The is operator checks whether two objects are not merely equal, but in fact the same object (is checks for identity; for checking mere equality, you use the == operator). Checking object identity is not particularly useful for immutable objects (we’re using it here just to show that the call to copy.copy was useless, although innocuous). However, checking object identity can sometimes be quite important for mutable objects. For example, if you’re not sure whether two names a and b refer to separate objects, or whether both refer to the same object, a simple and very fast check a is b lets you know how things stand. That way you know whether you need to copy the object before altering it, in case you want to keep the original object unaltered.

Warning

You can use other, inferior ways exist to create copies, namely building your own. Given a list L, both a “whole-object slice” L[:] and a list comprehension [x for x in L] do happen to make a (shallow) copy of L, as do adding an empty list, L+[ ], and multiplying the list by 1, L*1 . . . but each of these constructs is just wasted effort and obfuscation—calling list(L) is clearer and faster. You should, however, be familiar with the L[:] construct because for historical reasons it’s widely used. So, even though you’re best advised not to use it yourself, you’ll see it in Python code written by others.

Note

Similarly, given a dictionary d, you could create a shallow copy named d1 by coding out a loop:

>>> d1 = {  }
>>> for somekey in d:
...    d1[somekey] = d[somekey]

Note

or more concisely by d1 = { }; d1.update(d). However, again, such coding is a waste of time and effort and produces nothing but obfuscated, fatter, and slower code. Use d1=dict(d), be happy!

See Also

Module copy in the Library Reference and Python in a Nutshell.

4.2. Constructing Lists with List Comprehensions

Credit: Luther Blissett

Problem

You want to construct a new list by operating on elements of an existing sequence (or other kind of iterable).

Solution

Say you want to create a new list by adding 23 to each item of some other list. A list comprehension expresses this idea directly:

thenewlist = [x + 23 for x in theoldlist]

Similarly, say you want the new list to comprise all items in the other list that are larger than 5. A list comprehension says exactly that:

thenewlist = [x for x in theoldlist if x > 5]

When you want to combine both ideas, you can perform selection with an if clause, and also use some expression, such as adding 23, on the selected items, in a single pass:

thenewlist = [x + 23 for x in theoldlist if x > 5]

Discussion

Elegance, clarity, and pragmatism, are Python’s core values. List comprehensions show how pragmatism can enhance both clarity and elegance. Indeed, list comprehensions are often the best approach even when, instinctively, you’re thinking not of constructing a new list but rather of “altering an existing list”. For example, if your task is to set all items greater than 100 to 100, in an existing list object L, the best solution is:

L[:] = [min(x,100) for x in L]

Assigning to the “whole-list slice” L[:] alters the existing list object in place, rather than just rebinding the name L, as would be the case if you coded L = . . . instead.

You should not use a list comprehension when you simply want to perform a loop. When you want a loop, code a loop. For an example of looping over a list, see Recipe 4.4. See Chapter 19 for more information about iteration in Python.

It’s also best not to use a list comprehension when another built-in does what you want even more directly and immediately. For example, to copy a list, use L1 = list(L), not:

L1 = [x for x in L]

Similarly, when the operation you want to perform on each item is to call a function on the item and use the function’s result, use L1 = map(f, L) rather than L1 = [f(x) for x in L]. But in most cases, a list comprehension is just right.

In Python 2.4, you should consider using a generator expression, rather than a list comprehension, when the sequence may be long and you only need one item at a time. The syntax of generator expressions is just the same as for list comprehensions, except that generator expressions are surrounded by parentheses, ( and ), not brackets, [ and ]. For example, say that we only need the summation of the list computed in this recipe’s Solution, not each item of the list. In Python 2.3, we would code:

total = sum([x + 23 for x in theoldlist if x > 5])

In Python 2.4, we can code more naturally, omitting the brackets (no need to add additional parentheses—the parentheses already needed to call the built-in sum suffice):

total = sum(x + 23 for x in theoldlist if x > 5)

Besides being a little bit cleaner, this method avoids materializing the list as a whole in memory and thus may be slightly faster when the list is extremely long.

See Also

The Reference Manual section on list displays (another name for list comprehensions) and Python 2.4 generator expressions; Chapter 19; the Library Reference and Python in a Nutshell docs on the itertools module and on the built-in functions map, filter, and sum; Haskell is at http://www.haskell.org.

Warning

Python borrowed list comprehensions from the functional language Haskell (http://www.haskell.org), changing the syntax to use keywords rather than punctuation. If you do know Haskell, though, take care! Haskell’s list comprehensions, like the rest of Haskell, use lazy evaluation (also known as normal order or call by need). Each item is computed only when it’s needed. Python, like most other languages, uses (for list comprehensions as well as elsewhere) eager evaluation (also known as applicative order, call by value, or strict evaluation). That is, the entire list is computed when the list comprehension executes, and kept in memory afterwards as long as necessary. If you are translating into Python a Haskell program that uses list comprehensions to represent infinite sequences, or even just long sequences of which only one item at a time must be kept around, Python list comprehensions may not be suitable. Rather, look into Python 2.4’s new generator expressions, whose semantics are closer to the spirit of Haskell’s lazy evaluation—each item gets computed only when needed.

4.3. Returning an Element of a List If It Exists

Credit: Nestor Nissen, A. Bass

Problem

You have a list L and an index i, and you want to get L[i] when i is a valid index into L; otherwise, you want to get a default value v. If L were a dictionary, you’d use L.get(i, v), but lists don’t have a get method.

Solution

Clearly, we need to code a function, and, in this case, the simplest and most direct approach is the best one:

def list_get(L, i, v=None):
    if -len(L) <= i < len(L): return L[i]
    else: return v

Discussion

The function in this recipe just checks whether i is a valid index by applying Python’s indexing rule: valid indices are negative ones down to -len(L) inclusive, and non-negative ones up to len(L) exclusive. If almost all calls to list_get pass a valid index value for i, you might prefer an alternative approach:

def list_get_egfp(L, i, v=None):
    try: return L[i]
    except IndexError: return v

However, unless a vast majority of the calls pass a valid index, this alternative (as some time-measurements show) can be up to four times slower than the list_get function shown in the solution. Therefore, this “easier to get forgiveness than permission” (EGFP) approach, although it is often preferable in Python, cannot be recommended for this specific case.

I’ve also tried quite a few fancy, intricate and obscure approaches, but, besides being hard to explain and to understand, they all end up slower than the plain, simple function list_get. General principle: when you write Python code, prefer clarity and readability to compactness and terseness—choose simplicity over subtlety. You often will be rewarded with code that runs faster, and invariably, you will end up with code that is less prone to bugs and is easier to maintain, which is far more important than minor speed differences in 99.9% of the cases you encounter in the real world.

See Also

Language Reference and Python in a Nutshell documentation on list indexing.

4.4. Looping over Items and Their Indices in a Sequence

Credit: Alex Martelli, Sami Hangaslammi

Problem

You need to loop on a sequence, but at each step you also need to know which index into the sequence you have reached (e.g., because you need to rebind some entries in the sequence), and Python’s preferred approach to looping doesn’t use the indices.

Solution

That’s what built-in function enumerate is for. For example:

for index, item in enumerate(sequence):
    if item > 23:
        sequence[index] = transform(item)

This is cleaner, more readable, and faster than the alternative of looping over indices and accessing items by indexing:

for index in range(len(sequence)):
    if sequence[index] > 23:
        sequence[index] = transform(sequence[index])

Discussion

Looping on a sequence is a very frequent need, and Python strongly encourages you to do just that, looping on the sequence directly. In other words, the Pythonic way to get each item in a sequence is to use:

for item in sequence:
    process(item)

rather than the indirect approach, typical of lower-level languages, of looping over the sequence’s indices and using each index to fetch the corresponding item:

for index in range(len(sequence)):
    process(sequence[index])

Looping directly is cleaner, more readable, faster, and more general (since you can loop on any iterable, by definition, while indexing works only on sequences, such as lists).

However, sometimes you do need to know the index, as well as the corresponding item, within the loop. The most frequent reason for this need is that, in order to rebind an entry in a list, you must assign the new item to thelist[index]. To support this need, Python offers the built-in function enumerate, which takes any iterable argument and returns an iterator yielding all the pairs (two-item tuples) of the form (index, item), one pair at a time. By writing your for loop’s header clause in the form:

for index, item in enumerate(sequence):

both the index and the item are available within the loop’s body.

For help remembering the order of the items in each pair enumerate yields, think of the idiom d=dict(enumerate(L)). This gives a dictionary d that’s equivalent to list L, in the sense that d[i] is L[i] for any valid non-negative index i.

See Also

Library Reference and Python in a Nutshell section about enumerate; Chapter 19.

4.5. Creating Lists of Lists Without Sharing References

Credit: David Ascher

Problem

You want to create a multidimensional list but want to avoid implicit reference sharing.

Solution

To build a list and avoid implicit reference sharing, use a list comprehension. For example, to build a 5 x 10 array of zeros:

multilist = [[0 for col in range(5)] for row in range(10)]

Discussion

When a newcomer to Python is first shown that multiplying a list by an integer repeats that list that many times, the newcomer often gets quite excited about it, since it is such an elegant notation. For example:

>>> alist = [0] * 5

is clearly an excellent way to get an array of 5 zeros.

The problem is that one-dimensional tasks often grow a second dimension, so there is a natural progression to:

>>> multi = [[0] * 5] * 3
>>> print multi[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

This appears to work, but the same newcomer is then often puzzled by bugs, which typically can be boiled down to a snippet such as:

>>> multi[0][0] = 'oops!'
>>> print multi[['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0], ['oops!', 0, 0, 0, 0]]

This issue confuses most programmers at least once, if not a few times (see the FAQ entry at http://www.python.org/doc/FAQ.html#4.50). To understand the issue, it helps to decompose the creation of the multidimensional list into two steps:

>>> row = [0] * 5          # a list with five references to 0
>>> multi = [row] * 3      # a list with three references to the row object

This decomposed snippet produces a multi that’s identical to that given by the more concise snippet [[0]*5]*3 shown earlier, and it has exactly the same problem: if you now assign a value to multi[0][0], you have also changed the value of multi[1][0] and that of multi[2][0] . . . , and, indeed, you have changed the value of row[0], too!

The comments are key to understanding the source of the confusion. Multiplying a sequence by a number creates a new sequence with the specified number of new references to the original contents. In the case of the creation of row, it doesn’t matter whether or not references are being duplicated, since the referent (the object being referred to) is a number, and therefore immutable. In other words, there is no practical difference between an object and a reference to an object if that object is immutable. In the second line, however, we create a new list containing three references to the contents of the [row] list, which holds a single reference to a list. Thus, multi contains three references to a single list object. So, when the first element of the first element of multi is changed, you are actually modifying the first element of the shared list. Hence the surprise.

List comprehensions, as shown in the “Solution”, avoid the problem. With list comprehensions, no sharing of references occurs—you have a truly nested computation. If you have followed the discussion thoroughly, it may have occurred to you that we don’t really need the inner list comprehension, only the outer one. In other words, couldn’t we get just the same effect with:

multilist = [[0]*5 for row in range(10)]

The answer is that, yes, we could, and in fact using list multiplication for the innermost axis and list comprehension for all outer ones is faster—over twice as fast in this example. So why don’t I recommend this latest solution? Answer: the speed improvement for this example is from 57 down to 24 microseconds in Python 2.3, from 49 to 21 in Python 2.4, on a typical PC of several years ago (AMD Athlon 1.2 GHz CPU, running Linux). Shaving a few tens of microseconds from a list-creation operation makes no real difference to your application’s performance: and you should optimize your code, if at all, only where it matters, where it makes a substantial and important difference to the performance of your application as a whole. Therefore, I prefer the code shown in the recipe’s Solution, simply because using the same construct for both the inner and the outer list creations makes it more conceptually symmetrical and easier to read!

See Also

Documentation for the range built-in function in the Library Reference and Python in a Nutshell.

4.6. Flattening a Nested Sequence

Credit: Luther Blissett, Holger Krekel, Hemanth Sethuram, ParzAspen Aspen

Problem

Some of the items in a sequence may in turn be sub-sequences, and so on, to arbitrary depth of “nesting”. You need to loop over a “flattened” sequence, “expanding” each sub-sequence into a single, flat sequence of scalar items. (A scalar, or atom, is anything that is not a sequence—i.e., a leaf, if you think of the nested sequence as a tree.)

Solution

We need to be able to tell which of the elements we’re handling are “subsequences” to be “expanded” and which are “scalars” to be yielded as is. For generality, we can take an argument that’s a predicate to tell us what items we are to expand. (A predicate is a function that we can call on any element and that returns a truth value: in this case, True if the element is a subsequence we are to expand, False otherwise.) By default, we can arbitrarily say that every list or tuple is to be “expanded”, and nothing else. Then, a recursive generator offers the simplest solution:

def list_or_tuple(x):
    return isinstance(x, (list, tuple))
def flatten(sequence, to_expand=list_or_tuple):
    for item in sequence:
        if to_expand(item):
            for subitem in flatten(item, to_expand):
                yield subitem
        else:
            yield item

Discussion

Flattening a nested sequence, or, equivalently, “walking” sequentially over all the leaves of a “tree”, is a common task in many kinds of applications. You start with a nested structure, with items grouped into sequences and subsequences, and, for some purposes, you don’t care about the structure at all. You just want to deal with the items, one after the other. For example,

for x in flatten([1, 2, [3, [  ], 4, [5, 6], 7, [8,], ], 9]):
    print x,

emits 1 2 3 4 5 6 7 8 9.

The only problem with this common task is that, in the general case, determining what is to be “expanded”, and what is to be yielded as a scalar, is not as obvious as it might seem. So, I ducked that decision, delegating it to a callable predicate argument that the caller can pass to flatten, unless the caller accepts flatten’s somewhat simplistic default behavior of expanding just tuples and lists.

In the same module as flatten, we should also supply another predicate that a caller might well want to use—a predicate that will expand just about any iterable except strings (plain and Unicode). Strings are iterable, but almost invariably applications want to treat them as scalars, not as subsequences.

To identify whether an object is iterable, we just need to try calling the built-in iter on that object: the call raises TypeError if the object is not iterable. To identify whether an object is string-like, we simply check whether the object is an instance of basestring, since isinstance(obj, basestring) is True when obj is an instance of any subclass of basestring—that is, any string-like type. So, the alternative predicate is not hard to code:

def nonstring_iterable(obj):
    try: iter(obj)
    except TypeError: return False
    else: return not isinstance(obj, basestring)

Now the caller may choose to call flatten(seq, nonstring_iterable) when the need is to expand any iterable that is not a string. It is surely better not to make the nonstring_iterable predicate the default for flatten, though: in a simple case, such as the example snippet we showed previously, flatten can be up to three times slower when the predicate is nonstring_iterable rather than list_or_tuple.

We can also write a nonrecursive version of generator flatten. Such a version lets you flatten nested sequences with nesting levels higher than Python’s recursion limit, which normally allows no more than a few thousand levels of recursion depth. The main technique for recursion removal is to keep an explicit last-in, first-out (LIFO) stack, which, in this case, we can implement with a list of iterators:

def flatten(sequence, to_expand=list_or_tuple):
    iterators = [ iter(sequence) ]
    while iterators:
        # loop on the currently most-nested (last) iterator
        for item in iterators[-1]:
            if to_expand(item):
                # subsequence found, go loop on iterator on subsequence
                iterators.append(iter(item))
                break
            else:
                yield item
        else:
            # most-nested iterator exhausted, go back, loop on its parent
            iterators.pop( )

The if clause of the if statement executes for any item we are to expand—that is, any subsequence on which we must loop; so in that clause, we push an iterator for the subsequence to the end of the stack, then execute a break to terminate the for, and go back to the outer while, which will in turn execute a new for statement on the iterator we just appended to the stack. The else clause of the if statement executes for any item we don’t expand, and it just yields the item.

The else clause of the for statement executes if no break statement interrupts the for loop—in other words, when the for loop runs to completion, exhausting the currently most-nested iterator. So, in that else clause, we remove the now-exhausted most-nested (last) iterator, and the outer while loop proceeds, either terminating if no iterators are left on the stack, or executing a new for statement that continues the loop on the iterator that’s back at the top of the stack—from wherever that iterator had last left off, intrinsically, because an iterator’s job is exactly to remember iteration state.

The results of this nonrecursive implementation of flatten are identical to those of the simpler recursive version given in this recipe’s Solution. If you think non-recursive implementations are faster than recursive ones, though, you may be disappointed: according to my measurements, the nonrecursive version is about 10% slower than the recursive one, across a range of cases.

See Also

Library Reference and Python in a Nutshell sections on sequence types and built-ins iter, isinstance, and basestring.

4.7. Removing or Reordering Columnsin a List of Rows

Credit: Jason Whitlark

Problem

You have a list of lists (rows) and need to get another list of the same rows but with some columns removed and/or reordered.

Solution

A list comprehension works well for this task. Say you have:

listOfRows = [ [1,2,3,4], [5,6,7,8], [9,10,11,12] ]

You want a list with the same rows but with the second of the four columns removed and the third and fourth ones interchanged. A simple list comprehension that performs this job is:

newList = [ [row[0], row[3], row[2]] for row in listOfRows ]

An alternative way of coding, that is at least as practical and arguably a bit more elegant, is to use an auxiliary sequence (meaning a list or tuple) that has the column indices you desire in their proper order. Then, you can nest an inner list comprehension that loops on the auxiliary sequence inside the outer list comprehension that loops on listOfRows:

newList = [ [row[ci] for ci in (0, 3, 2)] for row in listofRows ]

Discussion

I often use lists of lists to represent two-dimensional arrays. I think of such lists as having the “rows” of a “two-dimensional array” as their items. I often perform manipulation on the “columns” of such a “two-dimensional array”, typically reordering some columns, sometimes omitting some of the original columns. It is not obvious (at least, it was not immediately obvious to me) that list comprehensions are just as useful for this purpose as they are for other kinds of sequence-manipulation tasks.

A list comprehension builds a new list, rather than altering an existing one. But even when you do need to alter the existing list in place, the best approach is to write a list comprehension and assign it to the existing list’s contents. For example, if you needed to alter listOfRows in place, for the example given in this recipe’s Solution, you would code:

listOfRows[:] = [ [row[0], row[3], row[2]] for row in listOfRows ]

Do consider, as suggested in the second example in this recipe’s Solution, the possibility of using an auxiliary sequence to hold the column indices you desire, in the order in which you desire them, rather than explicitly hard-coding the list display as we did in the first example. You might feel a little queasy about nesting two list comprehensions into each other in this fashion, but it’s simpler and safer than you might fear. If you adopt this approach, you gain some potential generality, because you can choose to give a name to the auxiliary sequence of indices, use it to reorder several lists of rows in the same fashion, pass it as an argument to a function, whatever:

def pick_and_reorder_columns(listofRows, column_indexes):
    return [ [row[ci] for ci in column_indexes] for row in listofRows ]
columns = 0, 3, 2
newListOfPandas = pick_and_reorder_columns(oldListOfPandas, columns)
newListOfCats = pick_and_reorder_columns(oldListOfCats, columns)

This example performs just the same column reordering and selection as all the other snippets in this recipe, but it performs the operation on two separate “old” lists, obtaining from each the corresponding “new” list. Reaching for excessive generalization is a pernicious temptation, but here, with this pick_and_reorder_columns function, it seems that we are probably getting just the right amount of generality.

One last note: some people prefer a fancier way to express the kinds of list comprehensions that are used as “inner” ones in some of the functions used previously. Instead of coding them straightforwardly, as in:

    [row[ci] for ci in column_indexes]

they prefer to use the built-in function map, and the special method _ _getitem_ _ of row used as a bound-method, to perform the indexing subtask, so they code instead:

    map(row._ _getitem_ _, column_indexes)

Depending on the exact version of Python, perhaps this fancy and somewhat obscure way may be slightly faster. Nevertheless, I think the greater simplicity of the list comprehension form means the list comprehension is still the best way.

See Also

List comprehension docs in Language Reference and Python in a Nutshell.

4.8. Transposing Two-Dimensional Arrays

Credit: Steve Holden, Raymond Hettinger, Attila Vàsàrhelyi, Chris Perkins

Problem

You need to transpose a list of lists, turning rows into columns and vice versa.

Solution

You must start with a list whose items are lists all of the same length, such as:

arr = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

A list comprehension offers a simple, handy way to transpose such a two-dimensional array:

print [[r[col] for r in arr] for col in range(len(arr[0]))][[1, 4, 7, 10], [2, 5, 8, 11], [3, 6, 9, 12]]

A faster though more obscure alternative (with exactly the same output) can be obtained by exploiting built-in function zip in a slightly strange way:

print map(list, zip(*arr))

Discussion

This recipe shows a concise yet clear way to turn rows into columns, and also a faster though more obscure way. List comprehensions work well when you want to be clear yet concise, while the alternative solution exploits the built-in function zip in a way that is definitely not obvious.

Sometimes data just comes at you the wrong way. For instance, if you use Microsoft’s ActiveX Data Ojbects (ADO) database interface, due to array element-ordering differences between Python and Microsoft’s preferred implementation language (Visual Basic), the GetRows method actually appears to return database columns in Python, despite the method’s name. This recipe’s two solutions to this common kind of problem let you choose between clarity and speed.

In the list comprehension solution, the inner comprehension varies what is selected from (the row), while the outer comprehension varies the selector (the column). This process achieves the required transposition.

In the zip-based solution, we use the *a syntax to pass each item (row) of arr to zip, in order, as a separate positional argument. zip returns a list of tuples, which directly achieves the required transposition; we then apply list to each tuple, via the single call to map, to obtain a list of lists, as required. Since we don’t use zip’s result as a list directly, we could get a further slight improvement in performance by using itertools.izip instead (because izip does not materialize its result as a list in memory, but rather yields it one item at a time):

import itertools
print map(list, itertools.izip(*arr))

but, in this specific case, the slight speed increase is probably not worth the added complexity.

If you’re transposing large arrays of numbers, consider Numeric Python and other third-party packages. Numeric Python defines transposition and other axis-swinging routines that will make your head spin.

See Also

The Reference Manual and Python in a Nutshell sections on list displays (the other name for list comprehensions) and on the *a and *k notation for positional and named argument passing; built-in functions zip and map; Numeric Python (http://www.pfdubois.com/numpy/).

4.9. Getting a Value from a Dictionary

Credit: Andy McKay

Problem

You need to obtain a value from a dictionary, without having to handle an exception if the key you seek is not in the dictionary.

Solution

That’s what the get method of dictionaries is for. Say you have a dictionary such as d = {'key':'value',}. To get the value corresponding to key in d in an exception-safe way, code:

print d.get('key', 'not found')

If you need to remove the entry after you have obtained the value, call d.pop (which does a get-and-remove) instead of d.get (which just reads d and never changes it).

Discussion

Want to get a value for a key from a dictionary, without getting an exception if the key does not exist in the dictionary? Use the simple and useful get method of the dictionary.

If you try to get a value with the indexing syntax d[x], and the value of x is not a key in dictionary d, your attempt raises a KeyError exception. This is often okay. If you expected the value of x to be a key in d, an exception is just the right way to inform you that you’re mistaken (i.e., that you need to debug your program).

However, you often need to be more tentative about it: as far as you know, the value of x may or may not be a key in d. In this case, don’t start messing with in tests, such as:

if 'key' in d:
    print d['key']
else:
    print 'not found'

or try/except statements, such as:

try:
    print d['key']
except KeyError:
    print 'not found'

Instead, use the get method, as shown in the “Solution”. If you call d.get(x), no exception is thrown: you get d[x] if x is a key in d, and if it’s not, you get None (which you can check for or propagate). If None is not what you want to get when x is not a key of d, call d.get(x, somethingelse ) instead. In this case, if x is not a key, you will get the value of somethingelse.

get is a simple, useful mechanism that is well explained in the Python documentation, but a surprising number of people don’t know about it. Another similar method is pop, which is mostly like get, except that, if the key was in the dictionary, pop also removes it. Just one caveat: get and pop are not exactly parallel. d.pop(x) does raise KeyError if x is not a key in d; to get exactly the same effect as d.get(x), plus the entry removal, call d.pop(x,None) instead.

See Also

Recipe 4.10; the Library Reference and Python in a Nutshell sections on mapping types.

4.10. Adding an Entry to a Dictionary

Credit: Alex Martelli, Martin Miller, Matthew Shomphe

Problem

Working with a dictionary d, you need to use the entry d[k] when it’s already present, or add a new value as d[k] when k isn’t yet a key in d.

Solution

This is what the setdefault method of dictionaries is for. Say we’re building a word- to-page-numbers index, a dictionary that maps each word to the list of page numbers where it appears. A key piece of code in that application might be:

def addword(theIndex, word, pagenumber):
    theIndex.setdefault(word, [  ]).append(pagenumber)

This code is equivalent to more verbose approaches such as:

def addword(theIndex, word, pagenumber):
    if word in theIndex:
        theIndex[word].append(pagenumber)
    else:
        theIndex[word] = [pagenumber]

and:

def addword(theIndex, word, pagenumber):
    try:
        theIndex[word].append(pagenumber)
    except KeyError:
        theIndex[word] = [pagenumber]

Using method setdefault simplifies this task considerably.

Discussion

For any dictionary d, d.setdefault(k, v) is very similar to d.get(k, v), which was covered previously in Recipe 4.9. The essential difference is that, if k is not a key in the dictionary, the setdefault method assigns d[k]=v as a side effect, in addition to returning v. (get would just return v, without affecting d in any way.) Therefore, consider using setdefault any time you have get-like needs, but also want to produce this side effect on the dictionary.

setdefault is particularly useful in a dictionary with values that are lists, as detailed in Recipe 4.15. The most typical usage for setdefault is something like:

somedict.setdefault(somekey, [  ]).append(somevalue)

setdefault is not all that useful for immutable values, such as numbers. If you just want to count words, for example, the right way to code is to use, not setdefault, but rather get:

theIndex[word] = theIndex.get(word, 0) + 1

since you must rebind the dictionary entry at theIndex[word] anyway (because numbers are immutable). But for our word-to page-numbers example, you definitely do not want to fall into the performance trap that’s hidden in the following approach:

def addword(theIndex, word, pagenumber):
    theIndex[word] = theIndex.get(word, [  ]) + [pagenumber]

This latest version of addword builds three new lists each time you call it: an empty list that’s passed as the second argument to theIndex.get, a one-item list containing just pagenumber, and a list with N+1 items obtained by concatenating these two (where N is the number of times that word was previously found). Building such a huge number of lists is sure to take its toll, in performance terms. For example, on my machine, I timed the task of indexing the same four words occurring once each on each of 1,000 pages. Taking the first version of addword in the recipe as a reference point, the second one (using try/except) is about 10% faster, the third one (using setdefault) is about 20% slower—the kind of performance differences that you should blissfully ignore in just about all cases. This fourth version (using get) is four times slower—the kind of performance difference you just can’t afford to ignore.

See Also

Recipe 4.9; Recipe 4.15; Library Reference and Python in a Nutshell documentation about dict.

4.11. Building a Dictionary Without Excessive Quoting

Credit: Brent Burley, Peter Cogolo

Problem

You want to construct a dictionary whose keys are literal strings, without having to quote each key.

Solution

Once you get into the swing of Python, you’ll find yourself constructing a lot of dictionaries. When the keys are identifiers, you can avoid quoting them by calling dict with named-argument syntax:

data = dict(red=1, green=2, blue=3)

This is neater than the equivalent use of dictionary-display syntax:

data = {'red': 1, 'green': 2, 'blue': 3}

Discussion

One powerful way to build a dictionary is to call the built-in type dict. It’s often a good alternative to the dictionary-display syntax with braces and colons. This recipe shows that, by calling dict, you can avoid having to quote keys, when the keys are literal strings that happen to be syntactically valid for use as Python identifiers. You cannot use this approach for keys such as the literal strings '12ba' or 'for', because '12ba' starts with a digit, and for happens to be a Python keyword, not an identifier.

Also, dictionary-display syntax is the only case in Python where you need to use braces: if you dislike braces, or happen to work on a keyboard that makes braces hard to reach (as all Italian layout keyboards do!), you may be happier, for example, using dict() rather than { } to build an empty dictionary.

Calling dict also gives you other possibilities. dict(d) returns a new dictionary that is an independent copy of existing dictionary d, just like d.copy( )—but dict(d) works even when d is a sequence of pairs (key, value) instead of being a dictionary (when a key occurs more than once in the sequence, the last appearance of the key applies). A common dictionary-building idiom is:

d = dict(zip(the_keys, the_values))

where the_keys is a sequence of keys and the_values a “parallel” sequence of corresponding values. Built-in function zip builds and returns a list of (key, value) pairs, and built-in type dict accepts that list as its argument and constructs a dictionary accordingly. If the sequences are long, it’s faster to use module itertools from the standard Python library:

import itertools
d = dict(itertools.izip(the_keys, the_values))

Built-in function zip constructs the whole list of pairs in memory, while itertools.izip yields only one pair at a time. On my machine, with sequences of 10,000 numbers, the latter idiom is about twice as fast as the one using zip—18 versus 45 milliseconds with Python 2.3, 17 versus 32 with Python 2.4.

You can use both a positional argument and named arguments in the same call to dict (if the named argument clashes with a key specified in the positional argument, the named argument applies). For example, here is a workaround for the previously mentioned issue that Python keywords, and other nonidentifiers, cannot be used as argument names:

d = dict({'12ba':49, 'for': 23}, rof=41, fro=97, orf=42)

If you need to build a dictionary where the same value corresponds to each key, call dict.fromkeys(keys_sequence, value) (if you omit the value, it defaults to None). For example, here is a neat way to initialize a dictionary to be used for counting occurrences of various lowercase ASCII letters:

import string
count_by_letter = dict.fromkeys(string.ascii_lowercase, 0)

See Also

Library Reference and Python in a Nutshell sections on built-ins dict and zip, and on modules itertools and string.

4.12. Building a Dict from a List of Alternating Keys and Values

Credit: Richard Philips, Raymond Hettinger

Problem

You want to build a dict from a list of alternating keys and values.

Solution

The built-in type dict offers many ways to build dictionaries, but not this one, so we need to code a function for the purpose. One way is to use the built-in function zip on extended slices:

def dictFromList(keysAndValues):
    return dict(zip(keysAndValues[::2], keysAndValues[1::2]))

A more general approach, which works for any sequence or other iterable argument and not just for lists, is to “factor out” the task of getting a sequence of pairs from a flat sequence into a separate generator. This approach is not quite as concise as dictFromList, but it’s faster as well as more general:

def pairwise(iterable):
    itnext = iter(iterable).next
    while True:
        yield itnext( ), itnext( )
def dictFromSequence(seq):
    return dict(pairwise(seq))

Defining pairwise also allows updating an existing dictionary with any sequence of alternating keys and values—just code, for example, mydict.update(pairwise(seq)).

Discussion

Both of the “factory functions” in this recipe use the same underlying way to construct a dictionary: each calls dict with an argument that is a sequence of (key, value) pairs. All the difference is in how the functions build the sequence of pairs to pass to dict.

dictFromList builds a list of such pairs by calling built-in function zip with two extended-form slices of the function’s keysAndValues argument—one that gathers all items with even indices (meaning the items at index 0, 2, 4, . . .), the other that gathers all items with odd indices (starting at 1 and counting by 2 . . .). This approach is fine, but it works only when the argument named keysAndValues is an instance of a type or class that supports extended slicing, such as list, tuple or str. Also, this approach results in constructing several temporary lists in memory: if keysAndValues is a long sequence, all of this list construction activity can cost some performance.

dictFromSequence, on the other hand, delegates the task of building the sequence of pairs to the generator named pairwise. In turn, pairwise is coded to ensure that it can use any iterable at all—not just lists (or other sequences, such as tuples or strings), but also, for example, results of other generators, files, dictionaries, and so on. Moreover, pairwise yields pairs one at a time. It never constructs any long list in memory, an aspect that may improve performance if the input sequence is very long.

The implementation of pairwise is interesting. As its very first statement, pairwise binds local name itnext to the bound-method next of the iterator that it obtains by calling the built-in function iter on the iterable argument. This may seem a bit strange, but it’s a good general technique in Python: if you start with an object, and all you need to do with that object is call one of its methods in a loop, you can extract the bound-method, assign it to a local name, and afterwards just call the local name as if it were a function. pairwise would work just as well if the next method was instead called in a way that may look more normal to programmers who are used to other languages:

def pairwise_slow(iterable):
    it = iter(iterable)
    while True:
        yield it.next( ), it.next( )

However, this pairwise_slow variant isn’t really any simpler than the pairwise generator shown in the Solution (“more familiar to people who don’t know Python” is not a synonym of “simpler”!), and it is about 60% slower. Focusing on simplicity and clarity is one thing, and a very good one—indeed, a core principle of Python. Throwing performance to the winds, without getting any real advantage to compensate, is a completely different proposition and definitely not a practice that can be recommended in any language. So, while it is an excellent idea to focus on writing correct, clear, and simple code, it’s also very advisable to learn and use Python’s idioms that are most appropriate to your needs.

See Also

Recipe 19.7 for more general approaches to looping by sliding windows over an iterable. See the Python Reference Manual for more on extended slicing.

4.13. Extracting a Subset of a Dictionary

Credit: David Benjamin

Problem

You want to extract from a larger dictionary only that subset of it that corresponds to a certain set of keys.

Solution

If you want to leave the original dictionary intact:

def sub_dict(somedict, somekeys, default=None):
    return dict([ (k, somedict.get(k, default)) for k in somekeys ])

If you want to remove from the original the items you’re extracting:

def sub_dict_remove(somedict, somekeys, default=None):
    return dict([ (k, somedict.pop(k, default)) for k in somekeys ])

Two examples of these functions’ use and effects:

>>> d = {'a': 5, 'b': 6, 'c': 7}
>>> print sub_dict(d, 'ab'), d{'a': 5, 'b': 6} {'a': 5, 'b': 6, 'c': 7}
>>> print sub_dict_remove(d, 'ab'), d
{'a': 5, 'b': 6} {'c': 7}

Discussion

In Python, I use dictionaries for many purposes—database rows, primary and compound keys, variable namespaces for template parsing, and so on. So, I often need to create a dictionary that is based on another, larger dictionary, but only contains the subset of the larger dictionary corresponding to some set of keys. In most use cases, the larger dictionary must remain intact after the extraction; sometimes, however, I need to remove from the larger dictionary the subset that I’m extracting. This recipe’s solution shows both possibilities. The only difference is that you use method get when you want to avoid affecting the dictionary that you are getting data from, method pop when you want to remove the items you’re getting.

If some item k of somekeys is not in fact a key in somedict, this recipe’s functions put k as a key in the result anyway, with a default value (which I pass as an optional argument to either function, with a default value of None). So, the result is not necessarily a subset of somedict. This behavior is the one I’ve found most useful in my applications.

You might prefer to get an exception for “missing keys”—that would help alert you to a bug in your program, in cases in which you know all ks in somekeys should definitely also be keys in somedict. Remember, “errors should never pass silently. Unless explicitly silenced,” to quote The Zen of Python, by Tim Peters (enter the statement import this at an interactive Python prompt to read or re-read this delightful summary of Python’s design principles). So, if a missing key is an error, from the point of view of your application, then you do want to get an exception that alerts you to that error at once, if it ever occurs. If this is what you want, you can get it with minor modifications to this recipe’s functions:

def sub_dict_strict(somedict, somekeys):
    return dict([ (k, somedict[k]) for k in somekeys ])
def sub_dict_remove_strict(somedict, somekeys):
    return dict([ (k, somedict.pop(k)) for k in somekeys ])

As you can see, these strict variants are even simpler than the originals—a good indication that Python likes to raise exceptions when unexpected behavior occurs!

Alternatively, you might prefer missing keys to be simply omitted from the result. This, too, requires just minor modifications:

def sub_dict_select(somedict, somekeys):
    return dict([ (k, somedict[k]) for k in somekeys if k in somedict])
def sub_dict_remove_select(somedict, somekeys):
    return dict([ (k, somedict.pop(k)) for k in somekeys if k in somedict])

The if clause in each list comprehension does all we need to distinguish these _select variants from the _strict ones.

In Python 2.4, you can use generator expressions, instead of list comprehensions, as the arguments to dict in each of the functions shown in this recipe. Just change the syntax of the calls to dict, from dict([. . .]) to dict(. . .) (removing the brackets adjacent to the parentheses) and enjoy the resulting slight simplification and acceleration. However, these variants would not work in Python 2.3, which has list comprehensions but not generator expressions.

See Also

Library Reference and Python in a Nutshell documentation on dict.

4.14. Inverting a Dictionary

Credit: Joel Lawhead, Ian Bollinger, Raymond Hettinger

Problem

An existing dict maps keys to unique values, and you want to build the inverse dict, mapping each value to its key.

Solution

You can write a function that passes a list comprehension as dict’s argument to build the new requested dictionary:

def invert_dict(d):
    return dict([ (v, k) for k, v in d.iteritems( ) ])

For large dictionaries, though, it’s faster to use the generator izip from the itertools module in the Python Standard Library:

from itertools import izip
def invert_dict_fast(d):
    return dict(izip(d.itervalues( ), d.iterkeys( )))

Discussion

If the values in dict d are not unique, then d cannot truly be inverted, meaning that there exists no dict id such that for any valid key k, id[d[k]]==k. However, the functions shown in this recipe still construct, even in such cases, a “pseudo-inverse” dict pd such that, for any v that is a value in d, d[pd[v]]==v. Given the original dict d and the dict x returned by either of the functions shown in this recipe, you can easily check whether x is the true inverse of d or just d’s pseudo-inverse: x is the true inverse of d if and only if len(x)==len(d). That’s because, if two different keys have the same value, then, in the result of either of the functions in this recipe, one of the two keys will simply go “poof” into the ether, thus leaving the resulting pseudo-inverse dict shorter than the dict you started with. In any case, quite obviously, the functions shown in this recipe can work only if all values in d are hashable (meaning that they are all usable as keys into a dict): otherwise, the functions raise a TypeError exception.

When we program in Python, we normally “disregard minor optimizations,” as Donald Knuth suggested over thirty years ago: we place a premium on clarity and correctness and care relatively little about speed. However, it can’t hurt to know about faster possibilities: when we decide to code in a certain way because it’s simpler or clearer than another, it’s best if we are taking the decision deliberately, not out of ignorance.

Here, function invert_dict in this recipe’s Solution might perhaps be considered clearer because it shows exactly what it’s doing. Take the pairs k, v of key and value that method iteritems yields, swap them into (value, key) order, and feed the resulting list as the argument of dict, so that dict builds a dictionary where each value v is a key and the corresponding key k becomes that key’s value—just the inverse dict that our problem requires.

However, function invert_dict_fast, also in this recipe’s Solution, isn’t really any more complicated: it just operates more abstractly, by getting all keys and all values as two separate iterators and zipping them up (into an iterator whose items are the needed, swapped (value, key) pairs) via a call to generator izip, supplied by the itertools module of the Python Standard Library. If you get used to such higher abstraction levels, they will soon come to feel simpler than lower-level code!

Thanks to the higher level of abstraction, and to never materializing the whole list of pairs (but rather operating via generators and iterators that yield only one item at a time), function invert_dict_fast can be substantially faster than function invert_dict. For example, on my machine, to invert a 10,000-item dictionary, invert_dict takes about 63 milliseconds, but invert_dict_fast manages the same task in just 20 milliseconds. A speed increase by a factor of three, in general, is not to be sneered at. Such performance gains, when you work on large amounts of data, are the norm, rather than the exception, for coding at higher abstraction levels. This is particularly true when you can use itertools rather than loops or list comprehensions, because you don’t need to materialize some large list in memory at one time. Performance gain is an extra incentive for getting familiar with working at higher abstraction levels, a familiarity that has conceptual and productivity pluses, too.

See Also

Documentation on mapping types and itertools in the Library Reference and Python in a Nutshell; Chapter 19.

4.15. Associating Multiple Values with Each Key in a Dictionary

Credit: Michael Chermside

Problem

You need a dictionary that maps each key to multiple values.

Solution

By nature, a dictionary is a one-to-one mapping, but it’s not hard to make it one-to-many—in other words, to make one key map to multiple values. Your choice of one of two possible approaches depends on how you want to treat duplications in the set of values for a key. The following approach, based on using lists as the dict’s values, allows such duplications:

d1 = {  }
d.setdefault(key, [  ]).append(value)

while an alternative approach, based on using sub-dicts as the dict’s values, automatically eliminates duplications of values:

d2 = {  }
d2.setdefault(key, {  })[value] = 1

In Python 2.4, the no-duplication approach can equivalently be coded:

d3 = {  }
d3.setdefault(key, set( )).add(value)

Discussion

A normal dictionary performs a simple mapping of each key to one value. This recipe shows three easy, efficient ways to achieve a mapping of each key to multiple values, by holding as the dictionary’s values lists, sub-dicts, or, in Python 2.4, sets. The semantics of the list-based approach differ slightly but importantly from those of the other two in terms of how they deal with duplication. Each approach relies on the setdefault method of a dictionary, covered earlier in Recipe 4.10, to initialize the entry for a key in the dictionary, if needed, and in any case to return said entry.

You need to be able to do more than just add values for a key. With the first approach, which uses lists and allows duplications, here’s how to retrieve the list of values for a key:

list_of_values = d1[key]

Here’s how to remove one value for a key, if you don’t mind leaving empty lists as items of d1 when the last value for a key is removed:

d1[key].remove(value)

Despite the empty lists, it’s still easy to test for the existence of a key with at least one value—just use a function that always returns a list (maybe an empty one), such as:

def get_values_if_any(d, key):
    return d.get(key, [  ])

For example, to check whether 'freep' is among the values (if any) for key 'somekey' in dictionary d1, you can code: if 'freep' in get_values_if_any(d1, 'somekey').

The second approach, which uses sub-dicts and eliminates duplications, can use rather similar idioms. To retrieve the list of values for a key:

list_of_values = list(d2[key])

To remove one value for a key, leaving empty dictionaries as items of d2 when the last value for a key is removed:

del d2[key][value]

In the third approach, showing the Python 2.4-only version d3, which uses sets, this would be:

d3[key].remove(value)

One possibility for the get_values_if_any function in either the second or third (duplication-removing) approaches would be:

def get_values_if_any(d, key):
    return list(d.get(key, ( )))

This recipe focuses on how to code the raw functionality, but, to use this functionality in a systematic way, you’ll probably want to wrap up this code into a class. For that purpose, you need to make some of the design decisions that this recipe highlights. Do you want a value to be in the entry for a key multiple times? (Is the entry for each key a bag rather than a set, in mathematical terms?) If so, should remove just reduce the number of occurrences by 1, or should it wipe out all of them? This is just the beginning of the choices you have to make, and the right choices depend on the specifics of your application.

See Also

Recipe 4.10; the Library Reference and Python in a Nutshell sections on mapping types; Recipe 18.8 for an implementation of the bag type.

4.16. Using a Dictionary to Dispatch Methods or Functions

Credit: Dick Wall

Problem

You need to execute different pieces of code depending on the value of some control variable—the kind of problem that in some other languages you might approach with a case statement.

Solution

Object-oriented programming, thanks to its elegant concept of dispatching, does away with many (but not all) needs for case statements. In Python, dictionaries, and the fact that functions are first-class objects (in particular, functions can be values in a dictionary), conspire to make the full problem of “case statements” easier to solve. For example, consider the following snippet of code:

animals = [  ]
number_of_felines = 0
def deal_with_a_cat( ):
    global number_of_felines
    print "meow"
    animals.append('feline')
    number_of_felines += 1
def deal_with_a_dog( ):
    print "bark"
    animals.append('canine')
def deal_with_a_bear( ):
    print "watch out for the *HUG*!"
    animals.append('ursine')
tokenDict = {
    "cat": deal_with_a_cat,
    "dog": deal_with_a_dog,
    "bear": deal_with_a_bear,
    }
# Simulate, say, some words read from a file
words = ["cat", "bear", "cat", "dog"]
for word in words:
    # Look up the function to call for each word, and call it
    return tokenDict[word]( )
nf = number_of_felines
print 'we met %d feline%s' % (nf, 's'[nf==1:])
print 'the animals we met were:', ' '.join(animals)

Discussion

The key idea in this recipe is to construct a dictionary with string (or other) values as keys, and bound-methods, functions, or other callables as values. At each step of execution, we use the string keys to select which callable to execute and then call it. This approach can be used as a kind of generalized case statement.

It’s embarrassingly simple (really!), but I use this technique often. You can also use bound-methods or other callables instead of functions. If you use unbound methods, you need to pass an appropriate object as the first actual argument when you do call them. More generally, you can store, as the dictionary’s values, tuples including both a callable and arguments to pass to the callable.

I primarily use this technique in places where in other languages, I might want a case, switch, or select statement. For example, I use it to implement a poor man’s way to parse command files (e.g., an X10 macro control file).

See Also

The Library Reference section on mapping types; the Reference Manual section on bound and unbound methods; Python in a Nutshell about both dictionaries and callables.

4.17. Finding Unions and Intersections of Dictionaries

Credit: Tom Good, Andy McKay, Sami Hangaslammi, Robin Siebler

Problem

Given two dictionaries, you need to find the set of keys that are in both dictionaries (the intersection) or the set of keys that are in either dictionary (the union).

Solution

Sometimes, particularly in Python 2.3, you find yourself using dictionaries as concrete representations of sets. In such cases, you only care about the keys, not the corresponding values, and often you build the dictionaries by calls to dict.fromkeys, such as

a = dict.fromkeys(xrange(1000))
b = dict.fromkeys(xrange(500, 1500))

The fastest way to compute the dict that is the set-union is:

union = dict(a, **b)

The fastest concise way to compute the dict that is the set-intersection is:

inter = dict.fromkeys([x for x in a if x in b])

If the number of items in dictionaries a and b can be very different, then it can be important for speed considerations to have the shorter one in the for clause, and the longer one in the if clause, of this list comprehension. In such cases, it may be worth sacrificing some conciseness in favor of speed, by coding the intersection computation as follows:

if len(a) < len(b):
    inter = dict.fromkeys([x for x in a if x not in b])
else:
    inter = dict.fromkeys([x for x in b if x not in a])

Python also gives you types to represent sets directly (in standard library module sets, and, in Python 2.4, also as built-ins). Here is a snippet that you can use at the start of a module: the snippet ensures that name set is bound to the best available set type, so that throughout the module, you can then use the same code whether you’re using Python 2.3 or 2.4:

try:
    set
except NameError:
    from sets import Set as set

Having done this, you can now use type set to best effect, gaining clarity and conciseness, and (in Python 2.4) gaining a little speed, too:

a = set(xrange(1000))
b = set(xrange(500, 1500))
union = a | b
inter = a & b

Discussion

In Python 2.3, even though the Python Standard Library module sets offers an elegant data type Set that directly represents a set (with hashable elements), it is still common to use a dict to represent a set, partly for historical reasons. Just in case you want to keep doing it, this recipe shows you how to compute unions and intersections of such sets in the fastest ways, which are not obvious. The code in this recipe, on my machine, takes about 260 microseconds for the union, about 690 for the intersection (with Python 2.3; with Python 2.4, 260 and 600,respectively), while alternatives based on loops or generator expressions are substantially slower.

However, it’s best to use type set instead of representing sets by dictionaries. As the recipe shows, using set makes your code more direct and readable. If you dislike the or-operator (|) and the “and-operator” (&), you can equivalently use a.union(b) and a.intersection(b), respectively. Besides clarity, you also gain speed, particularly in Python 2.4: computing the union still takes about 260 microseconds, but computing the intersection takes only about 210. Even in Python 2.3, this approach is acceptably fast: computing the union takes about 270 microseconds, computing the intersection takes about 650—not quite as fast as Python 2.4 but still quite comparable to what you can get if you represent sets by dictionaries. Last but not least, once you use type set (whether it is the Python 2.4 built-in, or class Set from the Python Standard Library module sets, the interface is the same), you gain a wealth of useful set operations. For example, the set of elements that are in either a or b but not both is a^b or, equivalently, a.symmetric_difference(b).

Even if you start with dicts for other reasons, consider using sets anyway if you need to perform set operations. Say, for example, that you have in phones a dictionary that maps names to phone numbers and in addresses one that maps names to addresses. The clearest and simplest way to print all names for which you know both address and phone number, and their associated data, is:

for name in set(phones) & set(addresses):
    print name, phones[name], addresses[name]

This is much terser, and arguably clearer, than something like:

for name in phones:
    if name in addresses:
        print name, phones[name], addresses[name]

Another excellent alternative is:

for name in set(phones).intersection(addresses):
    print name, phones[name], addresses[name]

If you use the named intersection method, rather than the & intersection operator, you don’t need to turn both dicts into sets: just one of them. Then call intersection on the resulting set, and pass the other dict as the argument to the intersection method.

See Also

The Library Reference and Python in a Nutshell sections on mapping types, module sets, and Python 2.4’s built-in set type.

4.18. Collecting a Bunch of Named Items

Credit: Alex Martelli, Doug Hudgeon

Problem

You want to collect a bunch of items together, naming each item of the bunch, and you find dictionary syntax a bit heavyweight for the purpose.

Solution

Any normal class instance inherently wraps a dictionary, which it uses to hold its state. We can easily take advantage of this handily wrapped dictionary by coding a nearly empty class:

class Bunch(object):
    def _ _init_ _(self, **kwds):
        self._ _dict_ _.update(kwds)

Now, to group a few variables, create a Bunch instance:

point = Bunch(datum=y, squared=y*y, coord=x)

You can now access and rebind the named attributes just created, add others, remove some, and so on. For example:

if point.squared > threshold:
    point.isok = True

Discussion

We often just want to collect a bunch of stuff together, naming each item of the bunch. A dictionary is OK for this purpose, but a small do-nothing class is even handier and prettier to use.

It takes minimal effort to build a little class, as in this recipe, to provide elegant attribute-access syntax. While a dictionary is fine for collecting a few items in which each item has a name (the item’s key in the dictionary can be thought of as the item’s name, in this context), it’s not the best solution when all names are identifiers, to be used just like variables. In class Bunch’s _ _init_ _ method, we accept arbitrary named arguments with the **kwds syntax, and we use the kwds dictionary to update the initially empty instance dictionary, so that each named argument gets turned into an attribute of the instance.

Compared to attribute-access syntax, dictionary-indexing syntax is not quite as terse and readable. For example, if point was a dictionary, the little snippet at the end of the “Solution” would have to be coded like:

if point['squared'] > threshold:
    point['isok'] = True

An alternative implementation that’s just as attractive as the one used in this recipe is:

class EvenSimplerBunch(object):
    def _ _init_ _(self, **kwds):
        self._ _dict_ _ = kwds

Rebinding an instance’s dictionary may feel risqué, but it’s not actually any pushier than calling that dictionary’s update method. So you might prefer the marginal speed advantage of this alternative implementation of Bunch. Unfortunately, I cannot find anywhere in Python’s documentation an assurance that usage like:

d = {'foo': 'bar'}
x = EvenSimplerBunch(**d)

will forever keep making x._ _dict_ _ an independent copy of d rather than just sharing a reference. It does currently, and in every version, but unless it’s a documented semantic constraint, we cannot be entirely sure that it will keep working forever. So, if you do choose the implementation in EvenSimplerBunch, you might choose to assign a copy (dict(kwds) or kwds.copy( )) rather than kwds itself. And, if you do, then the marginal speed advantage disappears. All in all, the Bunch presented in this recipe’s Solution is probably preferable.

A further tempting but not fully sound alternative is to have the Bunch class inherit from dict, and set attribute access special methods equal to the item access special methods, as follows:

class DictBunch(dict):
    _ _getattr_ _ = dict._ _getitem_ _
    _ _setattr_ _ = dict._ _setitem_ _
    _ _delattr_ _ = dict._ _delitem_ _

One problem with this approach is that, with this definition, an instance x of DictBunch has many attributes it doesn’t really have, because it inherits all the attributes (methods, actually, but there’s no significant difference in this context) of dict. So, you can’t meaningfully check hasattr(x, someattr), as you could with the classes Bunch and EvenSimplerBunch previously shown, unless you can somehow rule out the value of someattr being any of several common words such as 'keys', 'pop', and 'get‘.

Python’s distinction between attributes and items is really a wellspring of clarity and simplicity. Unfortunately, many newcomers to Python wrongly believe that it would be better to confuse items with attributes, generally because of previous experience with JavaScript and other such languages, in which attributes and items are regularly confused. But educating newcomers is a much better idea than promoting item/attribute confusion.

See Also

The Python Tutorial section on classes; the Language Reference and Python in a Nutshell coverage of classes; Chapter 6 for more information about object-oriented programming in Python; Recipe 4.18 for more on the **kwds syntax.

4.19. Assigning and Testing with One Statement

Credit: Alex Martelli, Martin Miller

Problem

You are transliterating C or Perl code to Python, and to keep close to the original’s structure, you’d like an expression’s result to be both assigned and tested (as in if((x=foo( )) or while((x=foo( )) in such other languages).

Solution

In Python, you can’t code if x=foo(): . . . . Assignment is a statement, so it cannot fit into an expression, and you can only use expressions as conditions of if and while statements. This isn’t a problem, it just means you have to structure your code Pythonically! For example, to process a file object f line by line, instead of the following C-like (and syntactically incorrect, in Python) approach:

while (line=f.readline( )) != '':
    process(line)

you can code a highly Pythonic (readable, clean, fast) approach:

for line in f:
    process(line)

But sometimes, you’re transliterating from C, Perl, or another language, and you’d like your transliteration to be structurally close to the original. One simple utility class makes it easy:

class DataHolder(object):
    def _ _init_ _(self, value=None):
        self.value = value
    def set(self, value):
        self.value = value
        return value
    def get(self):
        return self.value
# optional and strongly discouraged, but nevertheless handy at times:
import _ _builtin_ _
_ _builtin_ _.DataHolder = DataHolder
_ _builtin_ _.data = data = DataHolder( )

With the help of the DataHolder class and its instance data, you can keep your C-like code structure intact in transliteration:

while data.set(file.readline( )) != '':
    process(data.get( ))

Discussion

In Python, assignment is a statement, not an expression. Thus, you cannot assign the result that you are also testing, for example, in the condition of an if, elif, or while statement. This is usually fine: just structure your code to avoid the need to assign while testing (in fact, your code will often become clearer as a result). In particular, whenever you feel the need to assign-and-test within the condition of a while loop, that’s a good hint that your loop’s structure probably wants to be refactored into a generator (or other iterator). Once you have refactored in this way, your loops become plain and simple for statements. The example given in the recipe, looping over each line read from a text file, is one where the refactoring has already been done on your behalf by Python itself, since a file object is an iterator whose items are the file’s lines.

However, sometimes you may be writing Python code that is the transliteration of code originally written in C, Perl, or some other language that supports assignment-as-expression. Such transliterations often occur in the first Python version of an algorithm for which a reference implementation is supplied, an algorithm taken from a book, and so on. In such cases, it’s often preferable to have the structure of your initial transliteration be close to that of the code you’re transcribing. You can refactor later and make your code more Pythonic—clearer, faster, and so on. But first, you want to get working code as soon as possible, and specifically you want code that is easy to check for compliance to the original it has been transliterated from. Fortunately, Python offers enough power to make it quite easy for you to satisfy this requirement.

Python doesn’t let us redefine the meaning of assignment, but we can have a method (or function) that saves its argument somewhere and also returns that argument so it can be tested. That somewhere is most naturally an attribute of an object, so a method is a more natural choice than a function. Of course, we could just retrieve the attribute directly (i.e., the get method is redundant), but it looks nicer to me to have symmetry between data.set and data.get.

data.set(whatever) can be seen as little more than syntactic sugar around data.value=whatever, with the added value of being acceptable as an expression. Therefore, it’s the one obviously right way to satisfy the requirement for a reasonably faithful transliteration. The only difference between the resulting Python code and the original (say) C or Perl code, is at the syntactic sugar level—the overall structure is the same, and that’s the key issue.

Importing _ _builtin_ _ and assigning to its attributes is a trick that basically defines a new built-in object at runtime. You can use that trick in your application’s start-up code, and then all other modules will automatically be able to access your new built-ins without having to do an import. It’s not good Python practice, though; on the contrary, it’s pushing the boundaries of Pythonic good taste, since the readers of all those other modules should not have to know about the strange side effects performed in your application’s startup code. But since this recipe is meant to offer a quick-and-dirty approach for a first transliteration that will soon be refactored to make it better, it may be acceptable in this specific context to cut more corners than one would in production-level code.

On the other hand, one trick you should definitely not use is the following abuse of a currently existing wart in list comprehensions:

while [line for line in [f.readline( )] if line!='']:
    process(line)

This trick currently works, since both Python 2.3 and 2.4 still “leak” the list comprehension control variable (here, line) into the surrounding scope. However, besides being obscure and unreadable, this trick is specifically deprecated: list comprehension control variable leakage will be fixed in some future version of Python, and this trick will then stop working at all.

See Also

The Tutorial section on classes; the documentation for the _ _builtin_ _ module in the Library Reference and Python in a Nutshell; Language Reference and Python in a Nutshell documentation on list comprehensions.

4.20. Using printf in Python

Credit: Tobias Klausmann, Andrea Cavalcanti

Problem

You’d like to output something to your program’s standard output with C’s function printf, but Python doesn’t have that function.

Solution

It’s easy to code a printf function in Python:

import sys
def printf(format, *args):
    sys.stdout.write(format % args)

Discussion

Python separates the concepts of output (the print statement) and formatting (the % operator), but if you prefer to have these concepts together, they’re easy to join, as this recipe shows. No more worries about automatic insertion of spaces or newlines, either. Now you need worry only about correctly matching format and arguments!

For example, instead of something like:

print 'Result tuple is: %r' % (result_tuple,),

with its finicky need for commas in unobvious places (i.e., one to make a singleton tuple around result_tuple, one to avoid the newline that print would otherwise insert by default), once you have defined this recipe’s printf function, you can just write:

printf('Result tuple is: %r', result_tuple)

See Also

Library Reference and Python in a Nutshell documentation for module sys and for the string formatting operator %; Recipe 2.13 for a way to implement C++’s <<-style output in Python.

4.21. Randomly Picking Items with Given Probabilities

Credit: Kevin Parks, Peter Cogolo

Problem

You want to pick an item at random from a list, just about as random.choice does, but you need to pick the various items with different probabilities given in another list, rather than picking any item with equal probability as random.choice does.

Solution

Module random in the standard Python library offers a wealth of possibilities for generating and using pseudo-random numbers, but it does not offer this specific functionality, so we must code it as a function of our own:

import random
def random_pick(some_list, probabilities):
    x = random.uniform(0, 1)
    cumulative_probability = 0.0
    for item, item_probability in zip(some_list, probabilities):
        cumulative_probability += item_probability
        if x < cumulative_probability: break
    return item

Discussion

Module random in the standard Python library does not have the weighted choice functionality that is sometimes needed in games, simulations, and random tests, so I wrote this recipe to supply this functionality. The recipe uses module random’s function uniform to get a uniformly distributed pseudo-random number between 0.0 and 1.0, then loops in parallel on items and their probabilities, computing the increasing cumulative probability, until the latter becomes greater than the pseudo-random number.

The recipe assumes, but does not check, that probabilities is a sequence with just as many items as some_list, which are probabilities—that is, numbers between 0.0 and 1.0, summing up to 1.0; if these assumptions are violated, you may still get some random picks, but they will not follow the (inconsistent) specifications encoded in the function’s arguments. You may want to add some assert statements at the start of the function to check that the arguments make sense, such as:

    assert len(some_list) == len(probabilities)
    assert 0 <= min(probabilities) and max(probabilities) <= 1
    assert abs(sum(probabilities)-1.0) < 1.0e-5

However, these checks can be quite time consuming, so I don’t normally use them and have not included them in the official Solution.

As I already mentioned, the problem solved in this recipe requires items to be associated with probabilities—numbers between 0 and 1, summing up to 1. A related but slightly different task is to get random picks with weighted relative probabilities given by small non-negative integers—odds, rather than probabilities. For this related problem, the best solution is a generator, with an internal structure that is rather different from the function random_pick given in this recipe’s Solution:

import random
def random_picks(sequence, relative_odds):
    table = [ z for x, y in zip(sequence, relative_odds) for z in [x]*y ]
    while True:
        yield random.choice(table)

This generator works by first preparing a table whose total number of items is sum(relative_odds), each item of seq appearing in the table as many times as the small non-negative integer that is its corresponding item in relative_odds. Once the table is prepared, the generator’s body is tiny and fast, as it simply delegates to random.choice the picking of each random item it yields. Typical uses of this random_picks generator might be:

>>> x = random_picks('ciao', [1, 1, 3, 2])
>>> for two_chars in zip('boo', x): print ''.join(two_chars),bc oa oa
>>> import itertools
>>> print ''.join(itertools.islice(x, 8))
icacaoco

See Also

Module random in the Library Reference and Python in a Nutshell.

4.22. Handling Exceptions Within an Expression

Credit: Chris Perkins, Gregor Rayman, Scott David Daniels

Problem

You want to code an expression, so you can’t directly use the statement try/except, but you still need to handle exceptions that the expression may throw.

Solution

To catch exceptions, try/except is indispensable, and, since try/except is a statement, the only way to use it inside an expression is to code an auxiliary function:

def throws(t, f, *a, **k):
    '''Return True iff f(*a, **k) raises an exception whose type is t
      (or, one of the items of _tuple_ t, if t is a tuple).'''
    try:
        f(*a, **k)
    except t:
        return True
    else:
        return False

For example, suppose you have a text file, which has one number per line, but also extra lines which may be whitespace, comments, or what-have-you. Here is how you can make a list of all the numbers in the file, skipping the lines that aren’t numbers:

data = [float(line) for line in open(some_file)
                    if not throws(ValueError, float, line)]

Discussion

You might prefer to name such a function raises, but I personally prefer throws, which is probably a throwback to C++. By whatever name, the auxiliary function shown in this recipe takes as its arguments, first an exception type (or tuple of exception types) t, then a callable f, and then arbitrary positional and named arguments a and k, which are to be passed on to f. Do not code, for example, if not throws(ValueError, float(line))! When you call a function, Python evaluates the arguments before passing control to the function; if an argument’s evaluation raises an exception, the function never even gets started. I’ve seen this erroneous usage attempted more than once by people who are just starting to use the assertRaises method from the standard Python library’s unittest.TestCase class, for example.

When throws executes, it just calls f within the try clause of a try/except statement, passing on the arbitrary positional and named arguments. If the call to f in the try clause raises an exception whose type is t (or one of the items of t, if t is a tuple of exception types), then control passes to the corresponding except clause, which, in this case, returns True as throws' result. If no exception is raised in the try clause, then control passes to the corresponding else clause (if any), which, in this case, returns False as throws' result.

Note that, if some unexpected exception (one whose type is not in t) gets raised, then function throws does not catch that exception, so that throws terminates and propagates the exception to its caller. This choice is quite a deliberate one. Catching exceptions with a too-wide except clause is a bug-diagnosing headache waiting to happen. If the caller really wants throws to catch just about everything, it can always call throws(Exception, . . .—and live with the resulting headaches.

One problem with the throws function is that you end up doing the key operation twice—once just to see if it throws, tossing the result away, then, a second time, to get the result. It would be nicer to get the result, if any, together with an indication of whether an exception has been caught. I first tried something along the lines of:

def throws(t, f, *a, **k):
    " Return a pair (True, None) if f(*a, **k) raises an exception whose 
      type is in t, else a pair (False, x) where x is the result of f(*a, **k). "
    try:
        return False, f(*a, **k)
    except t:
        return True, None

Unfortunately, this version doesn’t fit in well in a list comprehension: there is no elegant way to get and use both the flag and the result. So, I chose a different approach: a function that returns a list in any case—empty if an exception was caught, otherwise with the result as the only item. This approach works fine in a list comprehension, but for clarity, the name of the function needs to be changed:

def returns(t, f, *a, **k):
    " Return [f(*a, **k)] normally, [  ] if that raises an exception in t. "
    try:
        return [ f(*a, **k) ]
    except t:
        return [ ]

The resulting list comprehension is even more elegant, in my opinion, than the original one in this recipe’s Solution:

data = [ x for line in open(some_file)
           for x in returns(ValueError, float, line) ]

See Also

Python in a Nutshell’s section on catching and handling exceptions; the sidebar The *args and **kwds Syntax for an explanation of *args and **kwds syntax.

4.23. Ensuring a Name Is Defined in a Given Module

Credit: Steven Cummings

Problem

You want to ensure that a certain name is defined in a given module (e.g., you want to ensure that there is a built-in name set), and, if not, you want to execute some code that sets the definition.

Solution

The solution to this problem is the only good use I’ve yet seen for statement exec. exec lets us execute arbitrary Python code from a string, and thus lets us write a very simple function to deal with this task:

import _ _builtin_ _
def ensureDefined(name, defining_code, target=_ _builtin_ _):
    if not hasattr(target, name):
        d = {  }
        exec defining_code in d
        assert name in d, 'Code %r did not set name %r' % (
            defining_code, name)
        setattr(target, name, d[name])

Discussion

If your code supports several versions of Python (or of some third-party package), then many of your modules must start with code such as the following snippet (which ensures name set is properly set in either Python 2.4, where it’s a built-in, or 2.3, where it must be obtained from the standard library):

try:
    set
except NameError:
    from sets import Set as set

This recipe encapsulates this kind of logic directly, and by default works on module _ _builtin_ _, since that’s the typical module for which you need to work around missing names in older Python versions. With this recipe, you could ensure name set is properly defined among the built-ins by running just once, during your program’s initialization, the single call:

ensureDefined('set', 'from sets import Set as set')

The key advantage of this recipe is that you can group all needed calls to ensureDefined in just one place of your application, at initialization time, rather than having several ad hoc try/except statements at the start of various modules. Moreover, ensureDefined may allow more readable code because it does only one specific job, so the purpose of calling it is obvious, while try/except statements could have several purposes, so that more study and reflection might be needed to understand them. Last but not least, using this recipe lets you avoid the warnings that the try/except approach can trigger from such useful checking tools as pychecker, http://pychecker.sourceforge.net/. (If you aren’t using pychecker or something like that, you should!)

The recipe takes care to avoid unintended accidental side effects on target, by using an auxiliary dictionary d as the target for the exec statement and then transferring only the requested name. This way, for example, you can use as target an object that is not a module (a class, say, or even a class instance), without necessarily adding to your target an attribute named _ _builtins_ _ that references the dictionary of Python’s built-ins. If you used less care, so that the body of the if statement was only:

        exec defining_code in vars(target)

you would inevitably get such side effects, as documented at http://www.python.org/doc/current/ref/exec.html.

It’s important to be aware that exec can and does execute any valid string of Python code that you give it. Therefore, make sure that the argument defining_code that you pass to any call of function ensureDefined does not come from an untrusted source, such as a text file that might have been maliciously tampered with.

See Also

The online documentation of the exec statement in the Python Language Reference Manual at http://www.python.org/doc/current/ref/exec.html.

Get Python Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.