Chapter 4. Writing Great Code

This chapter focuses on best practices for writing great Python code. We will review coding style conventions that will be used in ChapterÂ 5, and briefly cover logging best practices, plus list a few of the major differences between available open source licenses. All of this is intended to help you write code that is easy for us, your community, to use and extend.

Code Style

Pythonistas (veteran Python developers) celebrate having a language so accessible that people who have never programmed can still understand what a Python program does when they read its source code. Readability is at the heart of Pythonâs design, following the recognition that code is read much more often than it is written.

One reason Python code can be easily understood is its relatively complete set of code style guidelines (collected in the two Python Enhancement Proposals PEPÂ 20 and PEPÂ 8, described in the next few pages) and âPythonicâ idioms. When a Pythonista points to portions of code and says they are not âPythonic,â it usually means that those lines of code do not follow the common guidelines and fail to express the intent in what is considered the most readable way. Of course, âa foolish consistency is the hobgoblin of little minds.â¹ Pedantic devotion to the letter of the PEP can undermine readability and understandability.

PEP 8

PEP 8 is the de facto code style guide for Python. It covers naming conventions, code layout, whitespace (tabs versus spaces), and other similar style topics.

This is highly recommended reading. The entire Python community does its best to adhere to the guidelines laid out within this document. Some projects may stray from it from time to time, while others (like Requests) may amend its recommendations.

Conforming your Python code to PEPÂ 8 is generally a good idea and helps make code more consistent when working on projects with other developers. The PEPÂ 8 guidelines are explicit enough that they can be programmatically checked. There is a command-line program, pep8, that can check your code for conformity. Install it by running the following command in your terminal:

$ pip3 install pep8

Hereâs an example of the kinds of things you might see when you run pep8:

$ pep8 optparse.py

optparse.py:69:11: E401 multiple imports on one line
optparse.py:77:1: E302 expected 2 blank lines, found 1
optparse.py:88:5: E301 expected 1 blank line, found 0
optparse.py:222:34: W602 deprecated form of raising exception
optparse.py:347:31: E211 whitespace before '('
optparse.py:357:17: E201 whitespace after '{'
optparse.py:472:29: E221 multiple spaces before operator
optparse.py:544:21: W601 .has_key() is deprecated, use 'in'

The fixes to most of the complaints are straightforward and stated directly in PEPÂ 8. The code style guide for Requests gives examples of good and bad code and is only slightly modified from the original PEPÂ 8.

The linters referenced in âText Editorsâ usually use pep8, so you can also install one of these to run checks within your editor or IDE. Or, the program autopep8 can be used to automatically reformat code in the PEPÂ 8 style. Install the program with:

$ pip3 install autopep8

Use it to format a file in-place (overwriting the original) with:

$ autopep8 --in-place optparse.py

Excluding the --in-place flag will cause the program to output the modified code directly to the console for review (or piping to another file). The --aggressive flag will perform more substantial changes and can be applied multiple times for greater effect.

PEPÂ 20 (a.k.a. The Zen of Python)

PEPÂ 20, the set of guiding principles for decision making in Python, is always available via import this in a Python shell. Despite its name, PEPÂ 20 only contains 19 aphorisms, not 20 (the last has not been written downâ¦).

The true history of the Zen of Python is immortalized in Barry Warsawâs blog post âimport this and the Zen of Python.â

The Zen of Python by Tim Peters²

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases arenât special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be oneâand preferably only oneâobvious way to do it.
Although that way may not be obvious at first unless youâre Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, itâs a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great ideaâletâs do more of those!

For an example of each Zen aphorism in action, see Hunter Blanksâ presentation âPEPÂ 20 (The Zen of Python) by Example.â Raymond Hettinger also put these principles to fantastic use in his talk âBeyond PEP 8: Best Practices for Beautiful, Intelligible Code.â

General Advice

This section contains style concepts that are hopefully easy to accept without debate, and often applicable to languages other than Python. Some of them are direct from the Zen of Python, but others are just plain common sense. They reaffirm our preference in Python to select the most obvious way to present code, when multiple options are possible.

Explicit is better than implicit

While any kind of black magic is possible with Python, the simplest, most explicit way to express something is preferred:

Bad	Good
`def` `make_dict(args):` `x,` `y` `=` `args` `return` `dict(*locals())`	`def` `make_dict(x,` `y):` `return` `{'x':` `x,` `'y':` `y}`

In the good code, x and y are explicitly received from the caller, and an explicit dictionary is returned. A good rule of thumb is that another developer should be able to read the first and last lines of your function and understand what it does. Thatâs not the case with the bad example. (Of course, itâs also pretty easy when the function is only two lines long.)

Sparse is better than dense

Make only one statement per line. Some compound statements, such as list comprehensions, are allowed and appreciated for their brevity and their expressiveness, but it is good practice to keep disjoint statements on separate lines of code. It also makes for more understandable diffs³ when revisions to one statement are made:

Bad Good

Bad	Good
`print('one');` `print('two')`	`print('one')` `print('two')`
`if` `x` `==` `1:` `print('one')`	`if` `x` `==` `1:` `print('one')`
`if` `(<complex` `comparison>` `and` `<other` `complex` `comparison>):` `# do something`	`cond1` `=` `<complex` `comparison>` `cond2` `=` `<other` `complex` `comparison>` `if` `cond1` `and` `cond2:` `# do something`

print('one'); print('two')

print('one')
print('two')

if x == 1: print('one')

if x == 1:
    print('one')

if (<complex comparison> and
    <other complex comparison>):
    # do something

cond1 = <complex comparison>
cond2 = <other complex comparison>
if cond1 and cond2:
    # do something

Gains in readability, to Pythonistas, are more valuable than a few bytes of total code (for the two-prints-on-one-line statement) or a few microseconds of computation time (for the extra-conditionals-on-separate-lines statement). Plus, when a group is contributing to open source, the âgoodâ codeâs revision history will be easier to decipher because a change on one line can only affect one thing.

Errors should never pass silently / Unless explicitly silenced

Error handling in Python is done using the try statement. An example from Ben Gleitzmanâs HowDoI package (described more in âHowDoIâ) shows when silencing an error is OK:

def format_output(code, args):
    if not args['color']:
        return code
    lexer = None

    # try to find a lexer using the Stack Overflow tags
    # or the query arguments
    for keyword in args['query'].split() + args['tags']:
        try:
            lexer = get_lexer_by_name(keyword)
            break
        except ClassNotFound:
            pass

    # no lexer found above, use the guesser
    if not lexer:
        lexer = guess_lexer(code)

    return highlight(code,
                     lexer,
                     TerminalFormatter(bg='dark'))

This is part of a package that provides a command-line script to query the Internet (Stack Overflow, by default) for how to do a particular coding task, and prints it to the screen. The function format_output() applies syntax highlighting by first searching through the questionâs tags for a string understood by the lexer (also called a tokenizer; a âpythonâ, âjavaâ, or âbashâ tag will identify which lexer to use to split and colorize the code), and then if that fails, to try inferring the language from the code itself. There are three paths the program can follow when it reaches the try statement:

Execution enters the try clause (everything between the try and the except), a lexer is successfully found, the loop breaks, and the function returns the code highlighted with the selected lexer.
The lexer is not found, the ClassNotFound exception is thrown, itâs caught, and nothing is done. The loop continues until it finishes naturally or a lexer is found.
Some other exception occurs (like a KeyboardInterrupt) that is not handled, and it is raised up to the top level, stopping execution.

The âshould never pass silentlyâ part of the zen aphorism discourages the use of overzealous error trapping. Hereâs an example you can try in a separate terminal so that you can kill it more easily once you get the point:

>>> while True:
...     try:
...         print("nyah", end=" ")
...     except:
...         pass

Or donât try it. The except clause without any specified exception will catch everything, including KeyboardInterrupt (Ctrl+C in a POSIX terminal), and ignore it; so it swallows the dozens of interrupts you try to give it to shut the thing down. Itâs not just the interrupt issueâa broad except clause can also hide bugs, leaving them to cause some problem later on, when it will be harder to diagnose. We repeat, donât let errors pass silently: always explicitly identify by name the exceptions you will catch, and handle only those exceptions. If you simply want to log or otherwise acknowledge the exception and re-raise it, like in the following snippet, thatâs OK. Just donât let the error pass silently (without handling or re-raising it):

>>> while True:
...     try:
...         print("ni", end="-")
...     except:
...         print("An exception happened. Raising.")
...         raise

Function arguments should be intuitive to use

Your choices in API design will determine the downstream developerâs experience when interacting with a function. Arguments can be passed to functions in four different ways:

                                         
def func(positional, keyword=value, *args, **kwargs):
    pass

: Positional arguments are mandatory and have no default values.
: Keyword arguments are optional and have default values.
: An arbitrary argument list is optional and has no default values.
: An arbitrary keyword argument dictionary is optional and has no default values.

Here are tips for when to use each method of argument passing:

Positional arguments

Use these when there are only a few function arguments, which are fully part of the functionâs meaning, with a natural order. For instance, in send(message, recipient) or point(x, y) the user of the function has no difficulty remembering that those two functions require two arguments, and in which order.

Usage antipattern: It is possible to use argument names, and switch the order of arguments when calling functionsâfor example, calling send(recipient="World", message="The answer is 42.") and point(y=2, x=1). This reduces readability and is unnecessarily verbose. Use the more straightforward calls to send("The answer is 42", "World") and point(1, 2).

Keyword arguments

When a function has more than two or three positional parameters, its signature is more difficult to remember, and using keyword arguments with default values is helpful. For instance, a more complete send function could have the signature send(message, to, cc=None, bcc=None). Here cc and bcc are optional and evaluate to None when they are not passed another value.

Usage antipattern: It is possible to follow the order of arguments in the definition without explicitly naming the arguments, like in send("42", "Frankie", "Benjy", "Trillian"), sending a blind carbon copy to Trillian. It is also possible to name arguments in another order, like in send("42", "Frankie", bcc="Trillian", cc="Benjy"). Unless thereâs a strong reason not to, itâs better to use the form that is the closest to the function definition: send("42", "Frankie", cc="Benjy", bcc="Trillian").

Never is often better than right now

It is often harder to remove an optional argument (and its logic inside the function) that was added âjust in caseâ and is seemingly never used, than to add a new optional argument and its logic when needed.

Arbitrary argument list

Defined with the *args construct, it denotes an extensible number of positional arguments. In the function body, args will be a tuple of all the remaining positional arguments. For example, send(message, *args) can also be called with each recipient as an argument: send("42", "Frankie", "Benjy", "Trillian"); and in the function body, args will be equal to ("Frankie", "Benjy", "Trillian"). A good example of when this works is the print function.

Caveat: If a function receives a list of arguments of the same nature, itâs often more clear to use a list or any sequence. Here, if send has multiple recipients, we can define it explicitly: send(message, recipients) and call it with send("42", ["Benjy", "Frankie", "Trillian"]).

Arbitrary keyword argument dictionary

Defined via the **kwargs construct, it passes an undetermined series of named arguments to the function. In the function body, kwargs will be a dictionary of all the passed named arguments that have not been caught by other keyword arguments in the function signature. An example of when this is useful is in logging; formatters at different levels can seamlessly take what information they need without inconveniencing the user.

Caveat: The same caution as in the case of *args is necessary, for similar reasons: these powerful techniques are to be used when there is a proven necessity to use them, and they should not be used if the simpler and clearer construct is sufficient to express the functionâs intention.

Note

The variable names *args and **kwargs can (and should) be replaced with other names, when other names make more sense.

It is up to the programmer writing the function to determine which arguments are positional arguments and which are optional keyword arguments, and to decide whether to use the advanced techniques of arbitrary argument passing. After all, there should be oneâand preferably only oneâobvious way to do it. Other users will appreciate your effort when your Python functions are:

Easy to read (meaning the name and arguments need no explanation)
Easy to change (meaning adding a new keyword argument wonât break other parts of the code)

If the implementation is hard to explain, itâs a bad idea

A powerful tool for hackers, Python comes with a very rich set of hooks and tools allowing you to do almost any kind of tricky tricks. For instance, it is possible to:

Change how objects are created and instantiated
Change how the Python interpreter imports modules
Embed C routines in Python

All these options have drawbacks, and it is always better to use the most straightforward way to achieve your goal. The main drawback is that readability suffers when using these constructs, so whatever you gain must be more important than the loss of readability. Many code analysis tools, such as pylint or pyflakes, will be unable to parse this âmagicâ code.

A Python developer should know about these nearly infinite possibilities, because it instills confidence that no impassable problem will be on the way. However, knowing how and particularly when not to use them is very important.

Like a kung fu master, a Pythonista knows how to kill with a single finger, and never to actually do it.

We are all responsible users

As already demonstrated, Python allows many tricks, and some of them are potentially dangerous. A good example is that any client code can override an objectâs properties and methods: there is no âprivateâ keyword in Python. This philosophy is very different from highly defensive languages like Java, which provide a lot of mechanisms to prevent any misuse, and is expressed by the saying: âWe are all responsible users.â

This doesnât mean that, for example, no properties are considered private, and that proper encapsulation is impossible in Python. Rather, instead of relying on concrete walls erected by the developers between their code and othersâ code, the Python community prefers to rely on a set of conventions indicating that these elements should not be accessed directly.

The main convention for private properties and implementation details is to prefix all âinternalsâ with an underscore (e.g., sys._getframe). If the client code breaks this rule and accesses these marked elements, any misbehavior or problems encountered if the code is modified are the responsibility of the client code.

Using this convention generously is encouraged: any method or property that is not intended to be used by client code should be prefixed with an underscore. This will guarantee a better separation of duties and easier modification of existing code; it will always be possible to publicize a private property, but making a public property private might be a much harder operation.

Return values from one place

When a function grows in complexity, it is not uncommon to use multiple return statements inside the functionâs body. However, to keep a clear intent and sustain readability, it is best to return meaningful values from as few points in the body as possible.

The two ways to exit from a function are upon error, or with a return value after the function has been processed normally. In cases when the function cannot perform correctly, it can be appropriate to return a None or False value. In this case, it is better to return from the function as early as the incorrect context has been detected, to flatten the structure of the function: all the code after the return-because-of-failure statement can assume the condition is met to further compute the functionâs main result. Having multiple such return statements is often necessary.

Still, when possible, keep a single exit pointâitâs difficult to debug functions when you first have to identify which return statement is responsible for your result. Forcing the function to exit in just one place also helps to factor out some code paths, as the multiple exit points probably are a hint that such a refactoring is needed. This example is not bad code, but it could possibly be made more clear, as indicated in the comments :

def select_ad(third_party_ads, user_preferences):
    if not third_party_ads:
        return None  # Raising an exception might be better
    if not user_preferences:
        return None  # Raising an exception might be better
    # Some complex code to pick the best_ad given the
    # available ads and the individual's preferences...
    # Resist the temptation to return best_ad if succeeded...
    if not best_ad:
        # Some Plan-B computation of best_ad
    return best_ad  # A single exit point for the returned value
                    # will help when maintaining the code

Conventions

Conventions make sense to everyone, but may not be the only way to do things. The conventions we show here are the more commonly used choices, and we recommend them as the more readable option.

Alternatives to checking for equality

When you donât need to explicitly compare a value to True, or None, or 0, you can just add it to the if statement, like in the following examples. (See âTruth Value Testingâ for a list of what is considered false).

Bad Good

Bad	Good
`if` `attr` `==` `True:` `print` `'True!'`	`# Just check the value` `if` `attr:` `print` `'attr is truthy!'` `# or check for the opposite` `if` `not` `attr:` `print` `'attr is falsey!'` `# but if you only want 'True'` `if` `attr` `is` `True:` `print` `'attr is True'`
`if` `attr` `==` `None:` `print` `'attr is None!'`	`# or explicitly check for None` `if` `attr` `is` `None:` `print` `'attr is None!'`

if attr == True:
    print 'True!'

# Just check the value
if attr:
    print 'attr is truthy!'

# or check for the opposite
if not attr:
    print 'attr is falsey!'

# but if you only want 'True'
if attr is True:
    print 'attr is True'

if attr == None:
    print 'attr is None!'

# or explicitly check for None
if attr is None:
    print 'attr is None!'

Accessing dictionary elements

Use the x in d syntax instead of the dict.has_key method, or pass a default argument to dict.get():

Bad Good

Bad	Good
`>>>` `d` `=` `{'hello':` `'world'}` `>>>` `>>>` `if` `d.has_key('hello'):` `...` `print(d['hello'])` `# prints 'world'` `...` `else:` `...` `print('default_value')` `...` `world`	`>>>` `d` `=` `{'hello':` `'world'}` `>>>` `>>>` `print` `d.get('hello',` `'default_value')` `world` `>>>` `print` `d.get('howdy',` `'default_value')` `default_value` `>>>` `>>>` `# Or:` `...` `if` `'hello'` `in` `d:` `...` `print(d['hello'])` `...` `world`

>>> d = {'hello': 'world'}
>>>
>>> if d.has_key('hello'):
...     print(d['hello'])  # prints 'world'
... else:
...     print('default_value')
...
world

>>> d = {'hello': 'world'}
>>>
>>> print d.get('hello', 'default_value')
world
>>> print d.get('howdy', 'default_value')
default_value
>>>
>>> # Or:
... if 'hello' in d:
...     print(d['hello'])
...
world

Manipulating lists

List comprehensions provide a powerful, concise way to work with lists (for more information, see the entry in The Python Tutorial). Also, the map() and filter() functions can perform operations on lists using a different, more concise syntax:

Standard loop List comprehension

Standard loop	List comprehension
`# Filter elements greater than 4` `a` `=` `[3,` `4,` `5]` `b` `=` `[]` `for` `i` `in` `a:` `if` `i` `>` `4:` `b.append(i)`	`# The list comprehension is clearer` `a` `=` `[3,` `4,` `5]` `b` `=` `[i` `for` `i` `in` `a` `if` `i` `>` `4]` `# Or:` `b` `=` `filter(lambda` `x:` `x` `>` `4,` `a)`
`# Add three to all list members.` `a` `=` `[3,` `4,` `5]` `for` `i` `in` `range(len(a)):` `a[i]` `+=` `3`	`# Also clearer in this case` `a` `=` `[3,` `4,` `5]` `a` `=` `[i` `+` `3` `for` `i` `in` `a]` `# Or:` `a` `=` `map(lambda` `i:` `i` `+` `3,` `a)`

# Filter elements greater than 4
a = [3, 4, 5]
b = []
for i in a:
    if i > 4:
        b.append(i)

# The list comprehension is clearer
a = [3, 4, 5]
b = [i for i in a if i > 4]

# Or:
b = filter(lambda x: x > 4, a)

# Add three to all list members.
a = [3, 4, 5]
for i in range(len(a)):
    a[i] += 3

# Also clearer in this case
a = [3, 4, 5]
a = [i + 3 for i in a]

# Or:
a = map(lambda i: i + 3, a)

Use enumerate() to keep a count of your place in the list. It is more readable than manually creating a counter, and it is better optimized for iterators:

>>> a = ["icky", "icky", "icky", "p-tang"]
>>> for i, item in enumerate(a):
...     print("{i}: {item}".format(i=i, item=item))
...
0: icky
1: icky
2: icky
3: p-tang

Continuing a long line of code

When a logical line of code is longer than the accepted limit,⁴ you need to split it over multiple physical lines. The Python interpreter will join consecutive lines if the last character of the line is a backslash. This is helpful in some cases, but should usually be avoided because of its fragility: a whitespace character added to the end of the line, after the backslash, will break the code and may have unexpected results.

A better solution is to use parentheses around your elements. Left with an unclosed parenthesis on an end-of-line, the Python interpreter will join the next line until the parentheses are closed. The same behavior holds for curly and square braces:

Bad Good

Bad	Good
`french_insult` `=` \ `"Your mother was a hamster, and` `\` `your father smelt of elderberries!"`	`french_insult` `=` `(` `"Your mother was a hamster, and "` `"your father smelt of elderberries!"` `)`
`from` `some.deep.module.in.a.module` \ `import` `a_nice_function,` \ `another_nice_function,` \ `yet_another_nice_function`	`from` `some.deep.module.in.a.module` `import` `(` `a_nice_function,` `another_nice_function,` `yet_another_nice_function` `)`

french_insult = \
"Your mother was a hamster, and \
your father smelt of elderberries!"

french_insult = (
    "Your mother was a hamster, and "
    "your father smelt of elderberries!"
)

from some.deep.module.in.a.module \
    import a_nice_function, \
        another_nice_function, \
        yet_another_nice_function

from some.deep.module.in.a.module import (
    a_nice_function,
    another_nice_function,
    yet_another_nice_function
)

However, more often than not, having to split a long logical line is a sign that you are trying to do too many things at the same time, which may hinder readability.

Idioms

Although there usually is oneâand preferably only oneâobvious way to do it, the way to write idiomatic (or Pythonic) code can be non-obvious to Python beginners at first (unless theyâre Dutch⁵). So, good idioms must be consciously acquired.

Unpacking

If you know the length of a list or tuple, you can assign names to its elements with unpacking. For example, because itâs possible to specify the number of times to split a string in split() and rsplit(), the righthand side of an assignment can be made to split only once (e.g., into a filename and an extension), and the lefthand side can contain both destinations simultaneously, in the correct order, like this:

>>> filename, ext = "my_photo.orig.png".rsplit(".", 1)
>>> print(filename, "is a", ext, "file.")
my_photo.orig is a png file.

You can use unpacking to swap variables as well:

a, b = b, a

Nested unpacking works, too:

a, (b, c) = 1, (2, 3)

In Python 3, a new method of extended unpacking was introduced by PEP 3132:

a, *rest = [1, 2, 3]
# a = 1, rest = [2, 3]

a, *middle, c = [1, 2, 3, 4]
# a = 1, middle = [2, 3], c = 4

Ignoring a value

If you need to assign something while unpacking, but will not need that variable, use a double underscore (__):

filename = 'foobar.txt'
basename, __, ext = filename.rpartition('.')

Note

Many Python style guides recommend a single underscore (_) for throwaway variables rather than the double underscore (__) recommended here. The issue is that a single underscore is commonly used as an alias for the gettext.gettext() function, and is also used at the interactive prompt to hold the value of the last operation. Using a double underscore instead is just as clear and almost as convenient, and eliminates the risk of accidentally overwriting the single underscore variable, in either of these other use cases.

Creating a length-N list of the same thing

Use the Python list * operator to make a list of the same immutable item:

>>> four_nones = [None] * 4
>>> print(four_nones)
[None, None, None, None]

But be careful with mutable objects: because lists are mutable, the * operator will create a list of N references to the same list, which is not likely what you want. Instead, use a list comprehension:

Bad Good

Bad	Good
`>>>` `four_lists` `=` `[[]]` `*` `4` `>>>` `four_lists[0].append("Ni")` `>>>` `print(four_lists)` `[['Ni'],` `['Ni'],` `['Ni'],` `['Ni']]`	`>>>` `four_lists` `=` `[[]` `for` `__` `in` `range(4)]` `>>>` `four_lists[0].append("Ni")` `>>>` `print(four_lists)` `[['Ni'],` `[],` `[],` `[]]`

>>> four_lists = [[]] * 4
>>> four_lists[0].append("Ni")
>>> print(four_lists)
[['Ni'], ['Ni'], ['Ni'], ['Ni']]

>>> four_lists = [[] for __ in range(4)]
>>> four_lists[0].append("Ni")
>>> print(four_lists)
[['Ni'], [], [], []]

A common idiom for creating strings is to use str.join() on an empty string. This idiom can be applied to lists and tuples:

>>> letters = ['s', 'p', 'a', 'm']
>>> word = ''.join(letters)
>>> print(word)
spam

Sometimes we need to search through a collection of things. Letâs look at two options: lists and sets.

Take the following code for example:

>>> x = list(('foo', 'foo', 'bar', 'baz'))
>>> y = set(('foo', 'foo', 'bar', 'baz'))
>>>
>>> print(x)
['foo', 'foo', 'bar', 'baz']
>>> print(y)
{'foo', 'bar', 'baz'}
>>>
>>> 'foo' in x
True
>>> 'foo' in y
True

Even though both boolean tests for list and set membership look identical, foo in y is utilizing the fact that sets (and dictionaries) in Python are hash tables,⁶ the lookup performance between the two examples is different. Python will have to step through each item in the list to find a matching case, which is time-consuming (the time difference becomes significant for larger collections). But finding keys in the set can be done quickly, using the hash lookup. Also, sets and dictionaries drop duplicate entries, which is why dictionaries cannot have two identical keys. For more information, see this Stack Overflow discussion on list versus dict.

Exception-safe contexts

It is common to use try/finally clauses to manage resources like files or thread locks when exceptions may occur. PEPÂ 343 introduced the with statement and a context manager protocol into Python (in version 2.5 and beyond)âan idiom to replace these try/finally clauses with more readable code. The protocol consists of two methods, __enter__() and __exit__(), that when implemented for an object allow it to be used via the new with statement, like this:

>>> import threading
>>> some_lock = threading.Lock()
>>>
>>> with some_lock:
...     # Make Earth Mark One, run it for 10 million years ...
...     print(
...         "Look at me: I design coastlines.\n"
...         "I got an award for Norway."
...     )
...

which would previously have been:

>>> import threading
>>> some_lock = threading.Lock()
>>>
>>> some_lock.acquire()
>>> try:
...     # Make Earth Mark One, run it for 10 million years ...
...     print(
...         "Look at me: I design coastlines.\n"
...         "I got an award for Norway."
...     )
... finally:
...     some_lock.release()

The standard library module contextlib provides additional tools that help turn functions into context managers, enforce the call of an objectâs close() method, suppress exceptions (PythonÂ 3.4 and greater), and redirect standard output and error streams (PythonÂ 3.4 or 3.5 and greater). Here is an example use of contextlib.closing():

>>> from contextlib import closing
>>> with closing(open("outfile.txt", "w")) as output:
...     output.write("Well, he's...he's, ah...probably pining for the fjords.")
...
56

but because __enter__() and __exit__() methods are defined for the object that handles file I/O,⁷ we can use the with statement directly, without the closing:

>>> with open("outfile.txt", "w") as output:
    output.write(
       "PININ' for the FJORDS?!?!?!? "
       "What kind of talk is that?, look, why did he fall "
       "flat on his back the moment I got 'im home?\n"
    )
...
123

Common Gotchas

For the most part, Python aims to be a clean and consistent language that avoids surprises. However, there are a few cases that can be confusing to newcomers.

Some of these cases are intentional but can be potentially surprising. Some could arguably be considered language warts. In general, though, what follows is a collection of potentially tricky behaviors that might seem strange at first glance, but are generally sensible once youâre aware of the underlying cause for the surprise.

Mutable default arguments

Seemingly the most common surprise new Python programmers encounter is Pythonâs treatment of mutable default arguments in function definitions.

What you wrote:

def append_to(element, to=[]):
    to.append(element)
    return to

What you might have expected to happen:

my_list = append_to(12)
print(my_list)

my_other_list = append_to(42)
print(my_other_list)

A new list is created each time the function is called if a second argument isnât provided, so that the output is:

[12]
[42]

What actually happens:

[12]
[12, 42]

A new list is created once when the function is defined, and the same list is used in each successive call: Pythonâs default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will have mutated that object for all future calls to the function as well.

What you should do instead:

Create a new object each time the function is called, by using a default arg to signal that no argument was provided (None is often a good choice):

def append_to(element, to=None):
    if to is None:
        to = []
    to.append(element)
    return to

When this gotcha isnât a gotcha:

Sometimes you can specifically âexploitâ (i.e., use as intended) this behavior to maintain state between calls of a function. This is often done when writing a caching function (which stores results in-memory), for example:

def time_consuming_function(x, y, cache={}):
    args = (x, y)
    if args in cache:
        return cache[args]
    # Otherwise this is the first time with these arguments.
    # Do the time-consuming operation...
    cache[args] = result
    return result

Late binding closures

Another common source of confusion is the way Python binds its variables in closures (or in the surrounding global scope).

What you wrote:

def create_multipliers():
    return [lambda x : i * x for i in range(5)]

What you might have expected to happen:

for multiplier in create_multipliers():
    print(multiplier(2), end=" ... ")
print()

A list containing five functions that each have their own closed-over i variable that multiplies their argument, producing:

0 ... 2 ... 4 ... 6 ... 8 ...

What actually happens:

8 ... 8 ... 8 ... 8 ... 8 ...

Five functions are created; instead all of them just multiply x by 4. Why? Pythonâs closures are late binding. This means that the values of variables used in closures are looked up at the time the inner function is called.

Here, whenever any of the returned functions are called, the value of i is looked up in the surrounding scope at call time. By then, the loop has completed, and i is left with its final value of 4.

Whatâs particularly nasty about this gotcha is the seemingly prevalent misinformation that this has something to do with lambda expressions in Python. Functions created with a lambda expression are in no way special, and in fact the same exact behavior is exhibited by just using an ordinary def:

def create_multipliers():
    multipliers = []

    for i in range(5):
        def multiplier(x):
            return i * x
        multipliers.append(multiplier)

    return multipliers

What you should do instead:

The most general solution is arguably a bit of a hack. Due to Pythonâs aforementioned behavior concerning evaluating default arguments to functions (see âMutable default argumentsâ), you can create a closure that binds immediately to its arguments by using a default argument:

def create_multipliers():
    return [lambda x, i=i : i * x for i in range(5)]

Alternatively, you can use the functools.partial() function:

from functools import partial
from operator import mul

def create_multipliers():
    return [partial(mul, i) for i in range(5)]

When this gotcha isnât a gotcha:

Sometimes you want your closures to behave this way. Late binding is good in lots of situations (e.g., in the Diamond project, âExample use of a closure (when the gotcha isnât a gotcha)â). Looping to create unique functions is unfortunately a case where it can cause hiccups.

Structuring Your Project

By structure we mean the decisions you make concerning how your project best meets its objective. The goal is to best leverage Pythonâs features to create clean, effective code. In practical terms, that means the logic and dependencies in both your code and in your file and folder structure are clear.

Which functions should go into which modules? How does data flow through the project? What features and functions can be grouped together and isolated? By answering questions like these, you can begin to plan, in a broad sense, what your finished product will look like.

The Python Cookbook has a chapter on modules and packages that describes in detail how __import__ statements and packaging works. The purpose of this section is to outline aspects of Pythonâs module and import systems that are central to enforcing structure in your project. We then discuss various perspectives on how to build code that can be extended and tested reliably.

Thanks to the way imports and modules are handled in Python, it is relatively easy to structure a Python project: there are few constraints and the model for importing modules is easy to grasp. Therefore, you are left with the pure architectural task of crafting the different parts of your project and their interactions.

Modules

Modules are one of Pythonâs main abstraction layers, and probably the most natural one. Abstraction layers allow a programmer to separate code into parts that hold related data and functionality.

For example, if one layer of a project handles interfacing with user actions, while another handles low-level manipulation of data, the most natural way to separate these two layers is to regroup all interfacing functionality in one file, and all low-level operations in another file. This grouping places them into two separate modules. The interface file would then import the low-level file with the import module or from module import attribute statements.

As soon as you use import statements, you also use modules. These can be either built-in modules (such as os and sys), third-party packages you have installed in your environment (such as Requests or NumPy), or your projectâs internal modules. The following code shows some example import statements and confirms that an imported module is a Python object with its own data type:

>>> import sys  # built-in module
>>> import matplotlib.pyplot as plt  # third-party module
>>>
>>> import mymodule as mod  # internal project module
>>>
>>> print(type(sys), type(plt), type(mod))
<class 'module'> <class 'module'> <class 'module'>

To keep in line with the style guide, keep module names short and lowercase. And be sure to avoid using special symbols like the dot (.) or question mark (?), which would interfere with the way Python looks for modules. So a filename like my.spam.py⁸ is one you should avoid; Python would expect to find a spam.py file in a folder named my, which is not the case. The Python documentation gives more details about using dot notation.

Importing modules

Aside from some naming restrictions, nothing special is required to use a Python file as a module, but it helps to understand the import mechanism. First, the import modu statement will look for the definition of modu in a file named modu.py in the same directory as the caller if a file with that name exists. If it is not found, the Python interpreter will search for modu.py in Pythonâs search path recursively and raise an ImportError exception if it is not found. The value of the search path is platform-dependent and includes any user- or system-defined directories in the environmentâs $PYTHONPATH (or %PYTHONPATH% in Windows). It can be manipulated or inspected in a Python session:

import sys
>>> sys.path
[ '', '/current/absolute/path', 'etc']
# The actual list contains every path that is searched
# when you import libraries into Python, in the order
# that they'll be searched.

Once modu.py is found, the Python interpreter will execute the module in an isolated scope. Any top-level statement in modu.py will be executed, including other imports, if any exist. Function and class definitions are stored in the moduleâs dictionary.

Finally, the moduleâs variables, functions, and classes will be available to the caller through the moduleâs namespace, a central concept in programming that is particularly helpful and powerful in Python. Namespaces provide a scope containing named attributes that are visible to each other but not directly accessible outside of the namespace.

In many languages, an include file directive causes the preprocessor to, effectively, copy the contents of the included file into the callerâs code. Itâs different in Python: the included code is isolated in a module namespace. The result of the import modu statement will be a module object named modu in the global namespace, with the attributes defined in the module accessible via dot notation: modu.sqrt would be the sqrt object defined inside of modu.py, for example. This means you generally donât have to worry that the included code could have unwanted effectsâfor example, overriding an existing function with the same name.

It is possible to simulate the more standard behavior by using a special syntax of the import statement: from modu import *. However, this is generally considered bad practice: using import * makes code harder to read, makes dependencies less compartmentalized, and can clobber (overwrite) existing defined objects with the new definitions inside the imported module.

Using from modu import func is a way to import only the attribute you want into the global namespace. While much less harmful than from modu import * because it shows explicitly what is imported in the global namespace. Its only advantage over a simpler import modu is that it will save you a little typing.

TableÂ 4-1 compares the different ways to import definitions from other modules.

Table 4-1. Different ways to import definitions from modules
Very bad (confusing for a reader)	Better (obvious which new names are in the global namespace)	Best (immediately obvious where the attribute comes from)
`from` `modu` `import` `*` `x` `=` `sqrt(4)`	`from` `modu` `import` `sqrt` `x` `=` `sqrt(4)`	`import` `modu` `x` `=` `modu.sqrt(4)`
Is `sqrt` part of `modu`? Or a built-in? Or defined above?	Has `sqrt` been modified or redefined in between, or is it the one in `modu`?	Now `sqrt` is visibly part of `modu`âs namespace.

As mentioned in âCode Styleâ, readability is one of the main features of Python. Readable code avoids useless boilerplate text and clutter. But terseness and obscurity are the limits where brevity should stop. Explicitly stating where a class or function comes from, as in the modu.func() idiom, greatly improves code readability and understandability in all but the simplest single-file projects.

Structure Is Key

Though you can structure a project however you like, some pitfalls to avoid are:

Multiple and messy circular dependencies: If your classes Table and Chair in furn.py need to import Carpenter from workers.py to answer a question such as table.is_done_by(), and if the class Carpenter needs to import Table and Chair, to answer carpenter.what_do(), then you have a circular dependencyâfurn.py depends on workers.py, which depends on furn.py. In this case, you will have to resort to fragile hacks such as using import statements inside methods to avoid causing an ImportError.
Hidden coupling: Each and every change in Tableâs implementation breaks 20 tests in unrelated test cases because it breaks Carpenterâs code, which requires very careful surgery to adapt the change. This means you have too many assumptions about Table in Carpenterâs code.
Heavy use of global state or context: Instead of explicitly passing (height, width, type, wood) to each other, Table and Carpenter rely on global variables that can be modified and are modified on the fly by different agents. You need to scrutinize all access to these global variables to understand why a rectangular table became a square, and discover that remote template code is also modifying this context, messing with table dimensions.
Spaghetti code: Multiple pages of nested if clauses and for loops with a lot of copy-pasted procedural code and no proper segmentation are known as spaghetti code. Pythonâs meaningful indentation (one of its most controversial features) makes it hard to maintain this kind of code, so you may not see too much of it.
Ravioli code: This is more likely in Python than spaghetti code. Ravioli code consists of hundreds of similar little pieces of logic, often classes or objects, without proper structure. If you never can remember whether you have to use FurnitureTable, AssetTable or Table, or even TableNew for your task at hand, you might be swimming in ravioli code. Diamond, Requests, and Werkzeug (in the next chapter) avoid ravioli code by collecting their useful but unrelated pieces of logic into a utils.py module or a utils package to reuse across the project.

Packages

Python provides a very straightforward packaging system, which extends the module mechanism to a directory.

Any directory with an __init__.py file is considered a Python package. The top-level directory with an __init__.py is the root package.⁹ The different modules in the package are imported in a similar manner as plain modules, but with a special behavior for the __init__.py file, which is used to gather all package-wide definitions.

A file modu.py in the directory pack/ is imported with the statement import pack.modu. The interpreter will look for an __init__.py file in pack and execute all of its top-level statements. Then it will look for a file named pack/modu.py and execute all of its top-level statements. After these operations, any variable, function, or class defined in modu.py is available in the pack.modu namespace.

A commonly seen issue is too much code in __init__.py files. When the projectâs complexity grows, there may be subpackages and sub-subpackages in a deep directory structure. In this case, importing a single item from a sub-sub-package will require executing all __init__.py files met while traversing the tree.

It is normal, even good practice, to leave an __init__.py empty when the packageâs modules and subpackages do not need to share any codeâthe HowDoI and Diamond projects that are used as examples in the next section both have no code except version numbers in their __init__.py files. The Tablib, Requests, and Flask projects contain a top-level documentation string and import statements that expose the intended API for each project, and the Werkzeug project also exposes its top-level API but does it using lazy loading (extra code that only adds content to the namespace as it is used, which speeds up the initial import statement).

Lastly, a convenient syntax is available for importing deeply nested packages: import very.deep.module as mod. This allows you to use mod in place of the verbose repetition of very.deep.module.

Object-Oriented Programming

Python is sometimes described as an object-oriented programming language. This can be somewhat misleading and needs to be clarified.

In Python, everything is an object, and can be handled as such. This is what is meant when we say that functions are first-class objects. Functions, classes, strings, and even types are objects in Python: they all have a type, can be passed as function arguments, and may have methods and properties. In this understanding, Python is an object-oriented language.

However, unlike Java, Python does not impose object-oriented programming as the main programming paradigm. It is perfectly viable for a Python project to not be object orientedâthat is, to use no (or very few) class definitions, class inheritance, or any other mechanisms that are specific to object-oriented programming. These features are available, but not obligatory, for us Pythonistas. Moreover, as seen in âModulesâ, the way Python handles modules and namespaces gives the developer a natural way to ensure the encapsulation and separation of abstraction layersâthe most common reasons to use object orientationâwithout classes.

Proponents of functional programming (a paradigm that, in its purest form, has no assignment operator, no side effects, and basically chains functions to accomplish tasks), say that bugs and confusion occur when a function does different things depending on the external state of the systemâfor example, a global variable that indicates whether or not a person is logged in. Python, although not a purely functional language, has tools that make functional programming possible, and then we can restrict our use of custom classes to situations where we want to glue together a state and a functionality.

In some architectures, typically web applications, multiple instances of Python processes are spawned to respond to external requests that can happen at the same time. In this case, holding some state into instantiated objects, which means keeping some static information about the world, is prone to race conditions, a term used to describe the situation where, at some point between the initialization of the state of an object (usually done with the Class.__init__() method in Python) and the actual use of the object state through one of its methods, the state of the world has changed.

For example, a request may load an item in memory and later mark it as added to a userâs shopping cart. If another request sells the item to another person at the same time, it may happen that the sale actually occurs after the first session loaded the item, and then we are trying to sell inventory already flagged as sold. This and other issues led to a preference for stateless functions.

Our recommendation is as follows: when working with code that relies on some persistent context or global state (like most web applications), use functions and procedures with as few implicit contexts and side effects as possible. A functionâs implicit context is made up of any of the global variables or items in the persistence layer that are accessed from within the function. Side effects are the changes that a function makes to its implicit context. If a function saves or deletes data in a global variable or in the persistence layer, it is said to have a side effect.

Custom classes in Python should be used to carefully isolate functions with context and side effects from functions with logic (called pure functions). Pure functions are deterministic: given a fixed input, the output will always be the same. This is because they do not depend on context, and do not have side effects. The print() function, for example, is impure because it returns nothing but writes to standard output as a side effect. Here are some benefits of having pure, separate functions:

Pure functions are much easier to change or replace if they need to be refactored or optimized.
Pure functions are easier to test with unit-tests there is less need for complex context setup and data cleaning afterward.
Pure functions are easier to manipulate, decorate (more on decorators in a moment), and pass around.

In summary, for some architectures, pure functions are more efficient building blocks than classes and objects because they have no context or side effects. As an example, the I/O functions related to each of the file formats in the Tablib library (tablib/formats/*.pyâweâll look at Tablib in the next chapter) are pure functions, and not part of a class, because all they do is read data into a separate Dataset object that persists the data, or write the Dataset to a file. But the Session object in the Requests library (also coming up in the next chapter) is a class, because it has to persist the cookie and authentication information that may be exchanged in an HTTP session.

Note

Object orientation is useful and even necessary in many casesâfor example, when developing graphical desktop applications or games, where the things that are manipulated (windows, buttons, avatars, vehicles) have a relatively long life of their own in the computerâs memory. This is also one motive behind object-relational mapping, which maps rows in databases to objects in code, discussed further in âDatabase Librariesâ.

Decorators

Decorators were added to Python in version 2.4 and are defined and discussed in PEPÂ 318. A decorator is a function or a class method that wraps (or decorates) another function or method. The decorated function or method will replace the original function or method. Because functions are first-class objects in Python, this can be done manually, but using the @decorator syntax is clearer and preferred. Here is an example of how to use a decorator:

>>> def foo():
...     print("I am inside foo.")
...
...
...
>>> import logging
>>> logging.basicConfig()
>>>
>>> def logged(func, *args, **kwargs):
...     logger = logging.getLogger()
...     def new_func(*args, **kwargs):
...         logger.debug("calling {} with args {} and kwargs {}".format(
...                      func.__name__, args, kwargs))
...         return func(*args, **kwargs)
...     return new_func
...
>>>
>>>
... @logged
... def bar():
...     print("I am inside bar.")
...
>>> logging.getLogger().setLevel(logging.DEBUG)
>>> bar()
DEBUG:root:calling bar with args () and kwargs {}
I am inside bar.
>>> foo()
I am inside foo.

This mechanism is useful for isolating the core logic of the function or method. A good example of a task that is better handled with decoration is memoization or caching: you want to store the results of an expensive function in a table and use them directly instead of recomputing them when they have already been computed. This is clearly not part of the function logic. As of PEPÂ 3129, starting in PythonÂ 3, decorators can also be applied to classes.

Dynamic Typing

Python is dynamically typed (as opposed to statically typed), meaning variables do not have a fixed type. Variables are implemented as pointers to an object, making it possible for the variable a to be set to the value 42, then to the value âthanks for all the fishâ, then to a function.

The dynamic typing used in Python is often considered to be a weakness, because it can lead to complexities and hard-to-debug code: if something named a can be set to many different things, the developer or the maintainer must track this name in the code to make sure it has not been set to a completely unrelated object. TableÂ 4-2 illustrates good and bad practice when using names.

Advice Bad Good

Table 4-2. Avoid using the same variable name for different things
Advice	Bad	Good
Use short functions or methods to reduce the risk of using the same name for two unrelated things.	`a` `=` `1` `a` `=` `'answer is {}'.format(a)`	`def` `get_answer(a):` `return` `'answer is {}'.format(a)` `a` `=` `get_answer(1)`
Use different names for related items when they have a different type.	`# A string ...` `items` `=` `'a b c d'` `# No, a list ...` `items` `=` `items.split(' ')` `# No, a set ...` `items` `=` `set(items)`	`items_string` `=` `'a b c d'` `items_list` `=` `items.split(' ')` `items` `=` `set(items_list)`

Use short functions or methods to reduce the risk of using the same name for two unrelated things.

a = 1
a = 'answer is {}'.format(a)

def get_answer(a):
    return 'answer is {}'.format(a)

a = get_answer(1)

Use different names for related items when they have a different type.

# A string ...
items = 'a b c d'
# No, a list ...
items = items.split(' ')
# No, a set ...
items = set(items)

items_string = 'a b c d'
items_list = items.split(' ')
items = set(items_list)

There is no efficiency gain when reusing names: the assignment will still create a new object. And when the complexity grows and each assignment is separated by other lines of code, including branches and loops, it becomes harder to determine a given variableâs type.

Some coding practices, like functional programming, recommend against reassigning variables. In Java, a variable can be forced to always contain the same value after assignment by using the final keyword. Python does not have a final keyword, and it would be against its philosophy. But assigning a varible only once may be a good discipline; it helps reinforce the concept of mutable versus immutable types.

Tip

Pylint will warn you if you reassign a variable to two different types.

Mutable and Immutable Types

Python has two kinds of built-in or user-defined¹⁰ types:

# Lists are mutable
my_list = [1, 2, 3]
my_list[0] = 4
print my_list  # [4, 2, 3] <- The same list, changed.

# Integers are immutable
x = 6
x = x + 1  # The new x occupies a different location in memory.

Mutable types: These allow in-place modification of the objectâs content. Examples are lists and dictionaries, which have mutating methods like list.append() or dict.pop() and can be modified in place.
Immutable types: These types provide no method for changing their content. For instance, the variable x set to the integer 6 has no âincrementâ method. To compute x + 1, you have to create another integer and give it a name.

One consequence of this difference in behavior is that mutable types cannot be used as dictionary keys, because if the value ever changes, it will not hash to the same value, and dictionaries use hashing¹¹ for key storage. The immutable equivalent of a list is the tuple, created with parenthesesâfor example, (1, 2). It cannot be changed in place and so can be used as a dictionary key.

Using properly mutable types for objects that are intended to be mutable (e.g., my_list = [1, 2, 3]) and immutable types for objects that are intended to have a fixed value (e.g., islington_phone = ("220", "7946", "0347")) clarifies the intent of the code for other developers.

One peculiarity of Python that can surprise newcomers is that strings are immutable; attempting to change one will yield a type error:

>>> s = "I'm not mutable"
>>> s[1:7] = " am"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

This means that when constructing a string from its parts, it is much more efficient to accumulate the parts in a list, because it is mutable, and then join the parts together to make the full string. Also, a Python list comprehension, which is a shorthand syntax to iterate over an input to create a list, is better and faster than constructing a list from calls to append() within a loop. TableÂ 4-3 shows different ways to create a string from an iterable.

Bad Good Best

Table 4-3. Example ways to concatenate a string
Bad	Good	Best
`>>>` `s` `=` `""` `>>>` `for` `c` `in` `(97,` `98,` `98):` `...` `s` `+=` `unichr(c)` `...` `>>>` `print(s)` `abc`	`>>>` `s` `=` `[]` `>>>` `for` `c` `in` `(97,` `98,` `99):` `...` `s.append(unichr(c))` `...` `>>>` `print("".join(s))` `abc`	`>>>` `r` `=` `(97,` `98,` `99)` `>>>` `s` `=` `[unichr(c)` `for` `c` `in` `r]` `>>>` `print("".join(s))` `abc`

>>> s = ""
>>> for c in (97, 98, 98):
...     s += unichr(c)
...
>>> print(s)
abc

>>> s = []
>>> for c in (97, 98, 99):
...     s.append(unichr(c))
...
>>> print("".join(s))
abc

>>> r = (97, 98, 99)
>>> s = [unichr(c) for c in r]
>>> print("".join(s))
abc

The main Python page has a good discussion on this kind of optimization.

Finally, if the number of elements in a concatenation is known, pure string addition is faster (and more straightforward) than creating a list of items just to do a "".join(). All of the following formatting options to define cheese do the same thing:¹²

>>> adj = "Red"
>>> noun = "Leicester"
>>>
>>> cheese = "%s %s" % (adj, noun)  # This style was deprecated (PEP 3101)
>>> cheese = "{} {}".format(adj, noun)  # Possible since Python 3.1
>>> cheese = "{0} {1}".format(adj, noun)  # Numbers can also be reused
>>> cheese = "{adj} {noun}".format(adj=adj, noun=noun)  # This style is best
>>> print(cheese)
Red Leicester

Vendorizing Dependencies

A package that vendorizes dependencies includes external dependencies (third-party libraries) within its source, often inside of a folder named vendor, or packages. There is a very good blog post on the subject that lists the main reasons a package owner might do this (basically, to avoid various dependency issues), and discusses alternatives.

Consensus is that in almost all cases, it is better to keep the dependency separate, as it adds unnecessary content (often megabytes of extra code) to the repository; virtual environments used in combination with setup.py (preferred, especially when your package is a library) or a requirements.txt (which, when used, will override dependencies in setup.py in the case of conflicts) can restrict dependencies to a known set of working versions.

If those options are not enough, it might be helpful to contact the owner of the dependency to maybe resolve the issue by updating their package (e.g., your library many depend on an upcoming release of their package, or may need a specific new feature added), as those changes would likely benefit the entire community. The caveat is, if you submit pull requests for big changes, you may be expected to maintain those changes when further suggestions and requests come in; for this reason, both Tablib and Requests vendorize at least some dependencies. As the community moves into complete adoption of PythonÂ 3, hopefully fewer of the most pressing issues will remain.

Testing Your Code

Testing your code is very important. People are much more likely to use a project that actually works.

Python first included doctest and unittest in PythonÂ 2.1, released in 2001, embracing test-driven development (TDD), where the developer first writes tests that define the main operation and edge cases for a function, and then writes the function to pass those tests. Since then, TDD has become accepted and widely adopted in business and in open source projectsâitâs a good idea to practice writing the testing code and the running code in parallel. Used wisely, this method helps you precisely define your codeâs intent and have a more modular architecture.

Tips for testing

A test is about the most massively useful code a hitchhiker can write. Weâve summarized some of our tips here.

Just one thing per test

A testing unit should focus on one tiny bit of functionality and prove it correct.

Independence is imperative

Each test unit must be fully independent: able to run alone, and also within the test suite, regardless of the order they are called. The implication of this rule is that each test must be loaded with a fresh dataset and may have to do some cleanup afterward. This is usually handled by setUp() and tearDown() methods.

Precision is better than parsimony

Use long and descriptive names for testing functions. This guideline is slightly different than for running code, where short names are often preferred. The reason is testing functions are never called explicitly. square() or even sqr() is OK in running code, but in testing code, you should have names such as test_square_of_number_2() or test_square_negative_number(). These function names are displayed when a test fails and should be as descriptive as possible.

Speed counts

Try hard to make tests that are fast. If one test needs more than a few milliseconds to run, development will be slowed down, or the tests will not be run as often as is desirable. In some cases, tests canât be fast because they need a complex data structure to work on, and this data structure must be loaded every time the test runs. Keep these heavier tests in a separate test suite that is run by some scheduled task, and run all other tests as often as needed.

RTMF (Read the manual, friend!)

Learn your tools and learn how to run a single test or a test case. Then, when developing a function inside a module, run this functionâs tests often, ideally automatically when you save the code.

Test everything when you startâand again when you finish

Always run the full test suite before a coding session, and run it again after. This will give you more confidence that you did not break anything in the rest of the code.

Version control automation hooks are fantastic

It is a good idea to implement a hook that runs all tests before pushing code to a shared repository. You can directly add hooks to your version control system, and some IDEs provide ways to do this more simply in their own environments. Here are the links to the popular systemsâ documentation, which will step you through how to do this:

Write a breaking test if you want to take a break

If you are in the middle of a development session and have to interrupt your work, it is a good idea to write a broken unit test about what you want to develop next. When coming back to work, you will have a pointer to where you were and get back on track faster.

In the face of ambiguity, debug using a test

The first step when you are debugging your code is to write a new test pinpointing the bug. While it is not always possible to do, those bug catching tests are among the most valuable pieces of code in your project.

If the test is hard to explain, good luck finding collaborators

When something goes wrong or has to be changed, if your code has a good set of tests, you or other maintainers will rely largely on the testing suite to fix the problem or modify a given behavior. Therefore, the testing code will be read as much asâor even more thanâthe running code. A unit test whose purpose is unclear is not very helpful in this case.

If the test is easy to explain, it is almost always a good idea

Another use of the testing code is as an introduction to new developers. When other people will have to work on the code base, running and reading the related testing code is often the best thing they can do. They will (or should) discover the hot spots, where most difficulties arise, and the corner cases. If they have to add some functionality, the first step should be to add a test and, by this means, ensure the new functionality is not already a working path that has not been plugged into the interface.

Above all, donât panic

Itâs open source! The whole worldâs got your back.

Testing Basics

This section lists the basics of testingâfor an idea about what options are availableâand gives a few examples taken from the Python projects we dive into next, in ChapterÂ 5. There is an entire book on TDD in Python, and we donât want to rewrite it. Check out Test-Driven Development with Python (OâReilly) (obey the testing goat!).

unittest

unittest is the batteries-included test module in the Python standard library. Its API will be familiar to anyone who has used any of the JUnit (Java)/nUnit (.NET)/CppUnit (C/C++) series of tools.

Creating test cases is accomplished by subclassing unittest.TestCase. In this example code, the test function is just defined as a new method in MyTest:

# test_example.py
import unittest

def fun(x):
    return x + 1

class MyTest(unittest.TestCase):
    def test_that_fun_adds_one(self):
        self.assertEqual(fun(3), 4)

class MySecondTest(unittest.TestCase):
    def test_that_fun_fails_when_not_adding_number(self):
        self.assertRaises(TypeError, fun, "multiply six by nine")

Note

Test methods must start with the string test or they will not run. Test modules (files) are expected to match the pattern test*.py by default but can match any pattern given to the --pattern keyword argument on the command line.

To run all tests in that TestClass, open a terminal shell; and in the same directory as the file, invoke Pythonâs unittest module on the command line, like this:

$ python -m unittest test_example.MyTest
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Or to run all tests in a file, name the file:

$ python -m unittest test_example
.
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

Mock (in unittest)

As of Python 3.3, unittest.mock is available in the standard library. It allows you to replace parts of your system under test with mock objects and make assertions about how they have been used.

For example, you can monkey patch a method like in the following example (a monkey patch is code that modifies or replaces other existing code at runtime.) In this code, the existing method named ProductionClass.method, for the instance we create named instance, is replaced with a new object, MagicMock, which will always return the value 3 when called, and which counts the number of method calls it receives, records the signature it was called with, and contains assertion methods for testing purposes:

from unittest.mock import MagicMock

instance = ProductionClass()
instance.method = MagicMock(return_value=3)
instance.method(3, 4, 5, key='value')

instance.method.assert_called_with(3, 4, 5, key='value')

To mock classes or objects in a module under test, use the patch decorator. In the following example, an external search system is replaced with a mock that always returns the same result (as used in this example, the patch is only for the duration of the test):

import unittest.mock as mock

def mock_search(self):
    class MockSearchQuerySet(SearchQuerySet):
        def __iter__(self):
            return iter(["foo", "bar", "baz"])
    return MockSearchQuerySet()

# SearchForm here refers to the imported class reference
# myapp.SearchForm, and modifies this instance, not the
# code where the SearchForm class itself is initially
# defined.
@mock.patch('myapp.SearchForm.search', mock_search)
def test_new_watchlist_activities(self):
    # get_search_results runs a search and iterates over the result
    self.assertEqual(len(myapp.get_search_results(q="fish")), 3)

Mock has many other ways you can configure it and control its behavior. These are detailed in the Python documentation for unittest.mock.

doctest

The doctest module searches for pieces of text that look like interactive Python sessions in docstrings, and then executes those sessions to verify that they work exactly as shown.

Doctests serve a different purpose than proper unit tests. They are usually less detailed and donât catch special cases or obscure regression bugs. Instead, they are useful as an expressive documentation of the main use cases of a module and its components (an example of a happy path). However, doctests should run automatically each time the full test suite runs.

Hereâs a simple doctest in a function:

def square(x):
    """Squares x.

    >>> square(2)
    4
    >>> square(-2)
    4
    """

    return x * x

if __name__ == '__main__':
    import doctest
    doctest.testmod()

When you run this module from the command line (i.e., python module.py), the doctests will run and complain if anything is not behaving as described in the docstrings.

Examples

In this section, weâll take excerpts from our favorite packages to highlight good testing practice using real code. The test suites require additional libraries not included in the packages (e.g., Requests uses Flask to mock up an HTTP server) which are included in their projectsâ requirements.txt file.

For all of these examples, the expected first steps are to open a terminal shell, change directories to a place where you work on open source projects, clone the source repository, and set up a virtual environment, like this:

$ git clone https://github.com/username/projectname.git
$ cd projectname
$ virtualenv -p python3 venv
$ source venv/bin/activate
(venv)$ pip install -r requirements.txt

Example: Testing in Tablib

Tablib uses the unittest module in Pythonâs standard library for its testing. The test suite does not come with the package; you must clone the GitHub repository for the files. Here is an excerpt, with important parts annotated:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Tests for Tablib."""

import json
import unittest
import sys
import os
import tablib
from tablib.compat import markup, unicode, is_py3
from tablib.core import Row


class TablibTestCase(unittest.TestCase):  
    """Tablib test cases."""

    def setUp(self):  
        """Create simple data set with headers."""

        global data, book

        data = tablib.Dataset()
        book = tablib.Databook()

        #
        #  ... skip additional setup not used here ...
        #


    def tearDown(self):  
        """Teardown."""
        pass


    def test_empty_append(self):  
        """Verify append() correctly adds tuple with no headers."""
        new_row = (1, 2, 3)
        data.append(new_row)

        # Verify width/data
        self.assertTrue(data.width == len(new_row))
        self.assertTrue(data[0] == new_row)


    def test_empty_append_with_headers(self):  
        """Verify append() correctly detects mismatch of number of
        headers and data.
        """
        data.headers = ['first', 'second']
        new_row = (1, 2, 3, 4)

        self.assertRaises(tablib.InvalidDimensions, data.append, new_row)

: To use unittest, subclass unittest.TestCase, and write test methods whose names begin with test. The TestCase provides assert methods that check for equality, truth, data type, set membership, and whether exceptions are raisedâsee the documentation for more details.
: TestCase.setUp() is run before every single test method in the TestCase.
: TestCase.tearDown() is run after every single test method in the TestCase.¹³
: All test methods must begin with test, or they will not be run.
: There can be multiple tests within a single TestCase, but each one should test just one thing.

If you were contributing to Tablib, the first thing youâd do after cloning it is run the test suite and confirm that nothing breaks. Like this:

(venv)$ ### inside the top-level directory, tablib/
(venv)$ python -m unittest  test_tablib.py
..............................................................
----------------------------------------------------------------------
Ran 62 tests in 0.289s

OK

As of Python 2.7, unittest also includes its own test discovery mechanisms, using the discover option on the command line:

(venv)$ ### *above* the top-level directory, tablib/
(venv)$ python -m unittest discover tablib/
..............................................................
----------------------------------------------------------------------
Ran 62 tests in 0.234s

OK

After confirming all of the tests pass, youâd (a) find the test case related to the part youâre changing and run it often while youâre modifying the code, or (b) write a new test case for the feature youâre adding or the bug youâre tracking down and run that often while modifying the code. The following snippet is an example:

(venv)$ ### inside the top-level directory, tablib/
(venv)$ python -m unittest test_tablib.TablibTestCase.test_empty_append
.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK

Once your code works, youâd run the entire test suite again before pushing it to the repository. Because youâre running these tests so often, it makes sense that they should be as fast as possible. There are a lot more details about using unittest in the standard library unittest documentation.

Example: Testing in Requests

Requests uses py.test. To see it in action, open a terminal shell, change into a temporary directory, clone Requests, install the dependencies, and run py.test, as shown here:

$ git clone -q https://github.com/kennethreitz/requests.git
$
$ virtualenv venv -q -p python3  # dash -q for 'quiet'
$ source venv/bin/activate
(venv)$
(venv)$ pip install -q -r requests/requirements.txt   # 'quiet' again...
(venv)$ cd requests
(venv)$ py.test
========================= test session starts =================================
platform darwin -- Python 3.4.3, pytest-2.8.1, py-1.4.30, pluggy-0.3.1
rootdir: /tmp/requests, inifile:
plugins: cov-2.1.0, httpbin-0.0.7
collected 219 items

tests/test_requests.py ........................................................
X............................................
tests/test_utils.py ..s....................................................

========= 217 passed, 1 skipped, 1 xpassed in 25.75 seconds ===================

Other Popular Tools

The testing tools listed here are less frequently used, but still popular enough to mention.

pytest

pytest is a no-boilerplate alternative to Pythonâs standard unittest module, meaning it doesnât require the scaffolding of test classes, and maybe not even setup and teardown methods. To install it, use pip like usual:

$ pip install pytest

Despite being a fully featured and extensible test tool, it boasts a simple syntax. Creating a test suite is as easy as writing a module with a couple of functions:

# content of test_sample.py
def func(x):
    return x + 1

def test_answer():
    assert func(3) == 5

and then running the py.test command is far less work than would be required for the equivalent functionality with the unittest module:

$ py.test
=========================== test session starts ============================
platform darwin -- Python 2.7.1 -- pytest-2.2.1
collecting ... collected 1 items

test_sample.py F

================================= FAILURES =================================
_______________________________ test_answer ________________________________

    def test_answer():
>       assert func(3) == 5
E       assert 4 == 5
E        +  where 4 = func(3)

test_sample.py:5: AssertionError
========================= 1 failed in 0.02 seconds =========================

Nose

Nose extends unittest to make testing easier:

$ pip install nose

Nose provides automatic test discovery to save you the hassle of manually creating test suites. It also provides numerous plug-ins for features such as xUnit-compatible test output, coverage reporting, and test selection.

tox

tox is a tool for automating test environment management and testing against multiple interpreter configurations:

$ pip install tox

tox allows you to configure complicated multiparameter test matrices via a simple ini-style configuration file.

Options for older versions of Python

If you arenât in control of your Python version but still want to use these testing tools, here are a few options.

unittest2

unittest2 is a backport of PythonÂ 2.7âs unittest module which has an improved API and better assertions than the ones available in previous versions of Python.

If youâre using PythonÂ 2.6 or below (meaning you probably work at a large bank or Fortune 500 company), you can install it with pip:

$ pip install unittest2

You may want to import the module under the name unittest to make to make it easier to port code to newer versions of the module in the future:

import unittest2 as unittest

class MyTest(unittest.TestCase):
    ...

This way if you ever switch to a newer Python version and no longer need the unittest2 module, you can simply change the import in your test module without the need to change any other code.

Mock

If you liked âMock (in unittest)â but use a Python version below 3.3, you can still use unittest.mock by importing it as a separate library:

$ pip install mock

fixture

fixture can provide tools that make it easier to set up and tear down database backends for testing. It can load mock datasets for use with SQLAlchemy, SQLObject, Google Datastore, Django ORM, and Storm. There are still new releases, but it has only been tested on PythonÂ 2.4 through PythonÂ 2.6.

Lettuce and Behave

Lettuce and Behave are packages for doing behavior-driven development (BDD) in Python. BDD is a process that sprung out of TDD (obey the testing goat!) in the early 2000s, wishing to substitute the word âtestâ in test-driven development with âbehaviorâ to overcome newbiesâ initial trouble grasping TDD. The name was first coined by Dan North in 2003 and introduced to the world along with the Java tool JBehave in a 2006 article for Better Software magazine that is reproduced in Dan Northâs blog post, âIntroducing BDD.â

BDD grew very popular after the 2011 release of The Cucumber Book (Pragmatic Bookshelf), which documents a Behave package for Ruby. This inspired Gabriel Falcoâs Lettuce, and Peter Parenteâs Behave in our community.

Behaviors are described in plain text using a syntax named Gherkin that is human-readable and machine-processable. The following tutorials may be of use:

Documentation

Readability is a primary focus for Python developers, in both project and code documentation. The best practices described in this section can save both you and others a lot of time.

Project Documentation

There is API documentation for project users, and then there is additional project documentation for those who want to contribute to to the project. This section is about the additional project documentation.

A README file at the root directory should give general information to both users and maintainers of a project. It should be raw text or written in some very easy to read markup, such as reStructured Text (recommended because right now itâs the only format that can be understood by PyPI¹⁴) or Markdown. It should contain a few lines explaining the purpose of the project or library (without assuming the user knows anything about the project), the URL of the main source for the software, and some basic credit information. This file is the main entry point for readers of the code.

An INSTALL file is less necessary with Python (but may be helpful to comply with licence requirements such as the GPL). The installation instructions are often reduced to one command, such as pip install module or python setup.pyÂ install and added to the README file.

A LICENSE file should always be present and specify the license under which the software is made available to the public. (See âChoosing a Licenseâ for more information.)

A TODO file or a TODO section in README should list the planned development for the code.

A CHANGELOG file or section in README should compile a short overview of the changes in the code base for the latest versions.

Project Publication

Depending on the project, your documentation might include some or all of the following components:

An introduction should provide a very short overview of what can be done with the product, using one or two extremely simplified use cases. This is the 30-second pitch for your project.
A tutorial should show some primary use cases in more detail. The reader will follow a step-by-step procedure to set up a working prototype.
An API reference is typically generated from the code (see âDocstring Versus Block Commentsâ). It will list all publicly available interfaces, parameters, and return values.
Developer documentation is intended for potential contributors. This can include code conventions and the general design strategy of the project.

Sphinx

Sphinx is far and away the most popular¹⁵ Python documentation tool. Use it. It converts the reStructured Text markup language into a range of output formats, including HTML, LaTeX (for printable PDF versions), manual pages, and plain text.

There is also great, free hosting for your Sphinx documentation: Read the Docs. Use that, too. You can configure it with commit hooks to your source repository so that rebuilding your documentation will happen automatically.

Note

Sphinx is famous for its API generation, but it also works well for general project documentation. The online Hitchhikerâs Guide to Python is built with Sphinx and is hosted on Read the Docs.

reStructured Text

Sphinx uses reStructured Text, and nearly all Python documentation is written using it. If the content of your long_description argument to setuptools.setup() is written in reStructured Text, it will be rendered as HTML on PyPIâother formats will just be presented as text. Itâs like Markdown with all the optional extensions built in. Good resources for the syntax are:

Or just start contributing to your favorite packageâs documentation and learn by reading.

Docstring Versus Block Comments

Docstrings and block comments arenât interchangeable. Both can be used for a function or class. Hereâs an example using both:

# This function slows down program execution for some reason. 
def square_and_rooter(x):
    """Return the square root of self times self.""" 
    ...

: The leading comment block is a programmerâs note.
: The docstring describes the operation of the function or class and will be shown in an interactive Python session when the user types help(square_and_rooter).

Docstrings placed at the beginning of a module or at the top of an __init__.py file will also appear in help(). Sphinxâs autodoc feature can also automatically generate documentation using appropriately formatted docstrings. Instructions for how to do this, and how to format your docstrings for autodoc, are in the Sphinx tutorial. For further details on docstrings, see PEPÂ 257.

Logging

The logging module has been a part of Pythonâs Standard Library since version 2.3. It is succinctly described in PEPÂ 282. The documentation is notoriously hard to read, except for the basic logging tutorial.

Logging serves two purposes:

Diagnostic logging: Diagnostic logging records events related to the applicationâs operation. If a user calls in to report an error, for example, the logs can be searched for context.
Audit logging: Audit logging records events for business analysis. A userâs transactions (such as a clickstream) can be extracted and combined with other user details (such as eventual purchases) for reports or to optimize a business goal.

Logging in a Library

Notes for configuring logging for a library are in the logging tutorial. Another good resource for example uses of logging is the libraries we mention in the next chapter. Because the user, not the library, should dictate what happens when a logging event occurs, one admonition bears repeating:

It is strongly advised that you do not add any handlers other than NullHandler to your libraryâs loggers.

The NullHandler does what its name saysânothing. The user will otherwise have to expressly turn off your logging if they donât want it.

Best practice when instantiating loggers in a library is to only create them using the __name__ global variable: the logging module creates a hierarchy of loggers using dot notation, so using __name__ ensures no name collisions.

Here is an example of best practice from the Requests sourceâplace this in your projectâs top-level __init__.py:

# Set default logging handler to avoid "No handler found" warnings.
import logging
try:  # Python 2.7+
    from logging import NullHandler
except ImportError:
    class NullHandler(logging.Handler):
        def emit(self, record):
            pass

logging.getLogger(__name__).addHandler(NullHandler())

Logging in an Application

The Twelve-Factor App, an authoritative reference for good practice in application development, contains a section on logging best practice. It emphatically advocates for treating log events as an event stream, and for sending that event stream to standard output to be handled by the application environment.

There are at least three ways to configure a logger:

Â	Pros	Cons
Using an INI-formatted file	Itâs possible to update configuration while running using the function `logging.config.listen()` to listen for changes on a socket.	You have less control (e.g., custom subclassed filters or loggers) than possible when configuring a logger in code.
Using a dictionary or a JSON-formatted file	In addition to updating while running, it is also possible to load from a file using the json module, in the standard library since Python 2.6.	You have less control than when configuring a logger in code.
Using code	You have complete control over the configuration.	Any modifications require a change to source code.

Example configuration via an INI file

More details about the INI file format are in the logging configuration section of the logging tutorial. A minimal configuration file would look like this:

[loggers]
keys=root

[handlers]
keys=stream_handler

[formatters]
keys=formatter

[logger_root]
level=DEBUG
handlers=stream_handler

[handler_stream_handler]
class=StreamHandler
level=DEBUG
formatter=formatter
args=(sys.stderr,)

[formatter_formatter]
format=%(asctime)s %(name)-12s %(levelname)-8s %(message)s

The asctime, name, levelname, and message are all optional attributes available from the logging library. The full list of options and their definitions is available in the Python documentation. Let us say that our logging configuration file is named logging_config.ini. Then to set up the logger using this configuration in the code, weâd use logging.config.fileConfig():

import logging
from logging.config import fileConfig

fileConfig('logging_config.ini')
logger = logging.getLogger()
logger.debug('often makes a very good meal of %s', 'visiting tourists')

Example configuration via a dictionary

As of Python 2.7, you can use a dictionary with configuration details. PEPÂ 391 contains a list of the mandatory and optional elements in the configuration dictionary. Hereâs a minimal implementation:

import logging
from logging.config import dictConfig

logging_config = dict(
    version = 1,
    formatters = {
        'f': {'format':
              '%(asctime)s %(name)-12s %(levelname)-8s %(message)s'}
        },
    handlers = {
        'h': {'class': 'logging.StreamHandler',
              'formatter': 'f',
              'level': logging.DEBUG}
        },
    loggers = {
        'root': {'handlers': ['h'],
                 'level': logging.DEBUG}
        }
)

dictConfig(logging_config)

logger = logging.getLogger()
logger.debug('often makes a very good meal of %s', 'visiting tourists')

Example configuration directly in code

And last, here is a minimal logging configuration directly in code:

import logging

logger = logging.getLogger()
handler = logging.StreamHandler()
formatter = logging.Formatter(
        '%(asctime)s %(name)-12s %(levelname)-8s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)

logger.debug('often makes a very good meal of %s', 'visiting tourists')

Choosing a License

In the United States, when no license is specified with your source publication, users have no legal right to download, modify, or distribute it. Furthermore, people canât contribute to your project unless you tell them what rules to play by. You need a license.

Upstream Licenses

If you are deriving from another project, your choice may be determined by upstream licenses. For example, the Python Software Foundation (PSF) asks all contributors to Python source code to sign a contributor agreement that formally licenses their code to the PSF (retaining their own copyright) under one of two licenses.¹⁶

Because both of those licenses allow users to sublicense under different terms, the PSF is then free to distribute Python under its own license, the Python Software Foundation License. A FAQ for the PSF License goes into detail about what users can and cannot do in plain (not legal) language. It is not intended for further use beyond licensing the PSFâs distribution of Python.

Options

There are plenty of licenses available to choose from. The PSF recommends using one of the Open Source Institute (OSI)âapproved licenses. If you wish to eventually contribute your code to the PSF, the process will be much easier if you start with one of the licenses specified on the contributions page.

Note

Remember to change the placeholder text in the template licenses to actually reflect your information. For example, the MIT license template contains Copyright (c) <year> <copyright holders> on its second line. Apache License, Version 2.0 requires no modification.

Open source licenses tend to fall into one of two categories:¹⁷

Permissive licenses

Permissive licenses, often also called Berkeley Software Distribution (BSD)âstyle licenses, focus more on the userâs freedom to do with the software as they please. Some examples:

The Apache licensesâversion 2.0 is the current one, modified so that people can include it without modification in any project, can include the license by reference instead of listing it in every file, and can use Apache 2.0âlicensed code with the GNU General Public License version 3.0 (GPLv3).
Both the BSD 2-clause and 3-clause licensesâthe three-clause license is the two-clause license plus an additional restriction on use of the issuerâs trademarks.
The Massachusetts Institute of Technology (MIT) licensesâboth the Expat and the X11 versions are named after popular products that use the respective licenses.
The Internet Software Consortium (ISC) licenseâitâs almost identical to the MIT license except for a few lines now deemed to be extraneous.

Copyleft licenses

Copyleft licenses, or less permissive licenses, focus more on making sure that the source code itselfâincluding any changes made to itâis made available. The GPL family is the most well known of these. The current version is GPLv3.

Note

The GPLv2 license is not compatible with Apache 2.0; so code licensed with GPLv2 cannot be mixed with Apache 2.0âlicensed projects. But Apache 2.0âlicensed projects can be used in GPLv3 projects (which must subsequently all be GPLv3).

Licenses meeting the OSI criteria all allow commercial use, modification of the software, and distribution downstreamâwith different restrictions and requirements. All of the ones listed in TableÂ 4-4 also limit the issuerâs liability and require the user to retain the original copyright and license in any downstream distribution.

Table 4-4. Topics discussed in popular licenses
License family	Restrictions	Allowances	Requirements
BSD	Protects issuerâs trademark (BSD 3-clause)	Allows a warranty (BSD 2-clause and 3-clause)	â
MIT (X11 or Expat), ISC	Protects issuerâs trademark (ISC and MIT/X11)	Allows sublicensing with a different license	â
Apache version 2.0	Protects issuerâs trademark	Allows sublicensing, use in patents	Must state changes made to the source
GPL	Prohibits sublicensing with a different license	Allows a warranty, and (GPLv3 only) use in patents	Must state changes to the source and include source code

Licensing Resources

Van Lindbergâs book Intellectual Property and Open Source (OâReilly) is a great resource on the legal aspects of open source software. It will help you understand not only licenses, but also the legal aspects of other intellectual property topics like trademarks, patents, and copyrights as they relate to open source. If youâre not that concerned about legal matters and just want to choose something quickly, these sites can help:

GitHub offers a handy guide that summarizes and compares licenses in a few sentences.
TLDRLegal¹⁸ lists what can, cannot, and must be done under the terms of each license in quick bullets.
The OSI list of approved licenses contains the full text of all licenses that have passed their license review process for compliance with the Open Source Definition (allowing software to be freely used, modified, and shared).

¹ Originally stated by Ralph Waldo Emerson in Self-Reliance, it is quoted in PEPÂ 8 to affirm that the coderâs best judgment should supercede the style guide. For example, conformity with surrounding code and existing convention is more important than consistency with PEPÂ 8.

² Tim Peters is a longtime Python user who eventually became one of its most prolific and tenacious core developers (creating Pythonâs sorting algorithm, Timsort), and a frequent Net presence. He at one point was rumored to be a long-running Python port of the Richard Stallman AI program stallman.el. The original conspiracy theory appeared on a listserv in the late 1990s.

³ diff is a shell utility that identifies and shows lines that differ between two files.

⁴ A max of 80 characters according to PEPÂ 8, 100 according to many others, and for you, whatever your boss says. Ha! But honestly, anyone whoâs ever had to use a terminal to debug code while standing up next to a rack will quickly come to appreciate the 80-character limit (at which code doesnât wrap on a terminal) and in fact prefer 75â77 characters to allow for line numbering in Vi.

⁵ See ZenÂ 14. Guido, our BDFL, happens to be Dutch.

⁶ By the way, this is why only hashable objects can be stored in sets or used as dictionary keys. To make your own Python objects hashable, define an object.__hash__(self) member function that returns an integer. Objects that compare equal must have the same hash value. The Python documentation has more information.

⁷ In this case, the __exit__() method just calls the I/O wrapperâs close() method, to close the file descriptor. On many systems, thereâs a maximum allowable number of open file descriptors, and itâs good practice to release them when theyâre done.

⁸ If youâd like, you could name your module my_spam.py, but even our friend the underscore should not be seen often in module names (underscores give the impression of a variable name).

⁹ Thanks to PEPÂ 420, which was implemented in PythonÂ 3.3, there is now an alternative to the root package, called the namespace package. Namespace packages must not have an __init__.py and can be dispersed across multiple directories in sys.path. Python will gather all of the pieces together and present them together to the user as a single package.

¹⁰ Instructions to define your own types in C are provided in the Python extension documentation.

¹¹ An example of a simple hashing algorithm is to convert the bytes of an item to an integer, and take its value modulo some number. This is how memcached distributes keys across multiple computers.

¹² We should admit that even though, according to PEPÂ 3101, the percent-style formatting (%s, %d, %f) has been deprecated now for over a decade, most old hats still use it, and PEPÂ 460 just introduced this same method to format bytes or bytearray objects.

¹³ Note that unittest.TestCase.tearDown will not be run if the code errors out. This may be a surprise if youâve used features in unittest.mock to alter the codeâs actual behavior. In PythonÂ 3.1, the method unittest.TestCase.addCleanup() was added; it pushes a cleanup function and its arguments to a stack that will be called one by one after unittest.TestCase.tearDown() or else called anyway regardless of whether tearDown() was called. For more information, see the documentation on unittest.TestCase.addCleanup().

¹⁴ For those interested, thereâs some discussion about adding Markdown support for the README files on PyPI.

¹⁵ Other tools that you might see are Pycco, Ronn, Epydoc (now discontinued), and MkDocs. Pretty much everyone uses Sphinx and we recommend you do, too.

¹⁶ As of this writing, they were the Academic Free License v. 2.1 or the Apache License, Version 2.0. The full description of how this works is on the PSFâs contributions page.

¹⁷ All of the licenses described here are OSI-approved, and you can learn more about them from the main OSI license page.

¹⁸ tl;dr means âToo long; didnât read,â and apparently existed as editor shorthand before popularization on the Internet.

Get The Hitchhiker's Guide to Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Chapter 4. Writing Great Code

Code Style

PEP 8

PEPÂ 20 (a.k.a. The Zen of Python)

General Advice

Explicit is better than implicit

Sparse is better than dense

Errors should never pass silently / Unless explicitly silenced

Function arguments should be intuitive to use

Never is often better than right now

Note

If the implementation is hard to explain, itâs a bad idea

We are all responsible users

Return values from one place

Conventions

Alternatives to checking for equality

Accessing dictionary elements

Manipulating lists

Continuing a long line of code

Idioms

Unpacking

Ignoring a value

Note

Creating a length-N list of the same thing

Exception-safe contexts

Common Gotchas

Mutable default arguments

Late binding closures

Structuring Your Project

Modules

Importing modules

Packages

Object-Oriented Programming

Note

Decorators

Dynamic Typing

Tip

Mutable and Immutable Types

Vendorizing Dependencies

Testing Your Code

Tips for testing

Just one thing per test

Independence is imperative

Precision is better than parsimony

Speed counts

RTMF (Read the manual, friend!)

Test everything when you startâand again when you finish

Version control automation hooks are fantastic

Write a breaking test if you want to take a break

In the face of ambiguity, debug using a test

If the test is hard to explain, good luck finding collaborators

If the test is easy to explain, it is almost always a good idea

Above all, donât panic

Testing Basics

unittest

Note

Mock (in unittest)

doctest

Examples

Example: Testing in Tablib

Example: Testing in Requests

Other Popular Tools

pytest

Nose

tox

Options for older versions of Python

unittest2

Mock

fixture

Lettuce and Behave

Documentation

Project Documentation

Project Publication

Sphinx

Note

reStructured Text

Docstring Versus Block Comments

Logging

Logging in a Library

Logging in an Application

If the implementation is hard to explain, itâs a bad idea

Test everything when you startâand again when you finish

Above all, donât panic