Credit: Greg Wilson, Baltimore Technologies
Thirty years ago, in his classic The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley), Fred Brooks drew a distinction between accidental and intrinsic complexity. Languages such as English and C++, with their inconsistent rules, exceptions, and special cases, are examples of the former: they make communication and programming harder than they need to be. Concurrency, on the other hand, is a prime example of the latter. Most people have to struggle to keep one chain of events straight in their minds. Keeping track of two, three, or a dozen, plus all of their possible interactions, is just plain hard.
Computer scientists began studying ways of running multiple processes safely and efficiently in a single physical address space in the mid-1960s. Since then, a rich theory has been developed in which assertions about the behavior of interacting processes can be formalized and proved, and entire languages devoted to concurrent and parallel programming have been created. Foundations of Multithreaded, Parallel, and Distributed Programming, by Gregory R. Andrews (Addison-Wesley), is not only an excellent introduction to this theory, but also contains a great deal of historical information tracing the development of major ideas.
Over the past 20 years, opportunity and necessity have conspired to make concurrency a part of programmers’ everyday lives. The opportunity is for greater speed, which comes from the growing availability of multiprocessor machines. In the early 1980s, these were expensive curiosities; today, many programmers have dual-processor workstations on their desks and four-way or eight-way servers in the back room. If a calculation can be broken down into independent (or nearly independent) pieces, such machines can potentially solve them two, four, or eight times faster than their uniprocessor equivalents. While there are limits to the potential gains from this approach, it works well for problems as diverse as image processing, serving HTTP requests, and recompiling multiple source files.
In today’s terminology, processes run in separate logical address spaces that are protected from each other. Using concurrent processing for performance purposes, particularly in multiprocessor machines, is more attractive with threads, which execute simultaneously within the same program, in the same address space, without being protected from each other. The lack of mutual protection allows lower overhead and easier and faster communication, particularly because of the shared address space. Since all threads run code from the same program, there are no special security risks caused by a lack of mutual protection, any more than there are in a single-threaded program. Thus, concurrency used for performance purposes is most often focused on adding threads to a single program.
However, adding threads to a Python program to speed it up is often not a successful strategy. The reason for this is the Global Interpreter Lock (GIL), which protects Python’s internal data structures. This lock must be held by a thread before it can safely access Python objects. Without the lock, even simple operations (such as incrementing an integer) could fail.
Therefore, only the thread with the GIL
can manipulate Python objects or call Python/C API functions. To make
life easier for programmers, the interpreter releases and reacquires
the lock every 10 bytecode instructions (a value that can be changed
sys.setcheckinterval). The lock is also
released and reacquired around
I/O operations, such as reading or
writing a file, so that other threads can run while the thread that
requests the I/O is waiting for the I/O operation to complete.
However, effective performance-boosting exploitation of multiple
processors from multiple pure-Python threads of the same process is
just not in the cards. Unless the CPU performance bottlenecks in your
Python application are in C-coded extensions that release the GIL,
you will not observe substantial performance increases by moving your
multithreaded application to a multiprocessor machine.
The necessity for concurrent programming is largely because of the ever-growing importance of GUIs and network applications. Graphical interfaces often need to appear to be doing several things at once, such as displaying images while scrolling ads across the bottom of the screen. While it is possible to do the necessary interleaving manually, it is much simpler to code each operation on its own and let the underlying operating system decide on a concrete order of operations. Similarly, network applications often have to listen on several sockets at once or send data on one channel while receiving data on another.
Uniting these two types of applications is the fact that a GUI
can’t know when the user will press a key or move
the mouse, and an HTTP server can’t know which
datagram will arrive next. Handling each stream of events with a
separate control thread is often the simplest way to cope with this
unpredictability, even on single-processor machines, and when high
throughput is not an overriding concern. Event-driven programming can
often be used in these kinds of applications as well, and Python
frameworks such as Medusa and
asyncore are proof
that this approach often delivers excellent performance with
complexity that, while different from that inherent in
multithreading, is not necessarily larger.
The standard Python library allows programmers to approach
multithreaded programming at two
different levels. The core module,
thread, is a thin wrapper around the basic
primitives that any threading library must provide. Three of these
primitives are used to create, identify, and end threads; others are
used to create, test, acquire, and release simple mutual-exclusion
locks (or binary semaphores). In general, programmers should avoid
using these primitives directly, and should instead use the tools
included in the higher-level
threading module, which is substantially more
programmer-friendly and has similar performance characteristics.
The most important elements of the
module are classes that represent threads and various high-level
synchronization constructs. The
Thread class represents a separate control
thread; it can be told what to do by passing a callable object to its
constructor or by overriding its
run method. One
thread can start another by calling its
method or wait for it to complete by calling
Python also supports daemon threads, which do background
processing until all of the nondaemon threads in the program exit and
shut themselves down automatically.
The synchronization constructs in the
threading module include locks, reentrant
locks (which a single thread can safely relock many times without
deadlocking), counting semaphores, conditions, and events. Events can
be used by one thread to signal others that something interesting has
happened (e.g., that a new item has been added to a queue, or that it
is now safe for the next thread to modify a shared data structure).
The documentation that comes with Python describes each of these
The relatively low number of recipes in this chapter, compared to
others in this cookbook, reflects both Python’s
focus on programmer productivity (rather than absolute performance)
and the degree to which other packages (such as
httplib and wxWindows) hide the unpleasant details
of concurrency in important application areas. This also reflects
many Python programmers’ tendencies to look for the
simplest way to solve any particular problem, which complex threading
However, this chapter’s brevity may also reflect the
Python community’s underappreciation of the
potential that simple threading has, when used appropriately, to
simplify a programmer’s life. The
Queue module in particular supplies a
delightfully self-contained synchronization and cooperation structure
that can provide all the interthread supervision services you need.
Consider a typical program, which accepts requests from a GUI (or
from the network) and, as a result of such requests, will often find
itself faced with a substantial chunk of work that might take so long
to perform all at once that the program may appear unresponsive to
the GUI (or network).
In a purely event-driven architecture, it may take considerable
effort on the programmer’s part to slice up the
chunk into slices of work thin enough so each can be performed in
idle time, without ever giving the appearance of unresponsiveness.
Then, just a dash of multithreading can help considerably. The main
thread pushes the substantial chunk of background work onto a
Queue, then goes back to its task of
making the program’s interface appear responsive at
At the other end of the
Queue, a pool of daemonic
worker threads await, each ready to peel a work request off the
Queue and run it straight through. This kind of
overall architecture combines event-driven and multithreaded
approaches in the overarching ideal of simplicity, and is thus
maximally Pythonic, even though you may need just a little bit more
work if the result of a worker thread’s efforts must
be presented again to the main thread (via another
Queue, of course), which is normally the case with
GUIs. If you’re willing to cheat just a little and
use polling for the mostly event-driven main thread to access the
Queue back from the daemonic worker
threads, see Recipe 9.7 to get an idea of
how simple that little bit of work can be.