You are previewing Programming Python, 4th Edition.

Programming Python, 4th Edition

Cover of Programming Python, 4th Edition by Mark Lutz Published by O'Reilly Media, Inc.
  1. Programming Python
  2. A Note Regarding Supplemental Files
  3. Preface
    1. “And Now for Something Completely Different…”
    2. About This Book
      1. This Book’s Ecosystem
      2. What This Book Is Not
    3. About This Fourth Edition
      1. Specific Changes in This Edition
    4. What’s Left, Then?
    5. Python 3.X Impacts on This Book
      1. Specific 3.X Changes
      2. Language Versus Library: Unicode
      3. Python 3.1 Limitations: Email, CGI
    6. Using Book Examples
      1. Where to Look for Examples and Updates
      2. Example Portability
      3. Demo Launchers
      4. Code Reuse Policies
    7. Contacting O’Reilly
    8. Conventions Used in This Book
    9. Acknowledgments
  4. I. The Beginning
    1. 1. A Sneak Preview
      1. “Programming Python: The Short Story”
      2. The Task
      3. Step 1: Representing Records
      4. Step 2: Storing Records Persistently
      5. Step 3: Stepping Up to OOP
      6. Step 4: Adding Console Interaction
      7. Step 5: Adding a GUI
      8. Step 6: Adding a Web Interface
      9. The End of the Demo
  5. II. System Programming
    1. 2. System Tools
      1. “The os.path to Knowledge”
      2. System Scripting Overview
      3. Introducing the sys Module
      4. Introducing the os Module
    2. 3. Script Execution Context
      1. “I’d Like to Have an Argument, Please”
      2. Current Working Directory
      3. Command-Line Arguments
      4. Shell Environment Variables
      5. Standard Streams
    3. 4. File and Directory Tools
      1. “Erase Your Hard Drive in Five Easy Steps!”
      2. File Tools
      3. Directory Tools
    4. 5. Parallel System Tools
      1. “Telling the Monkeys What to Do”
      2. Forking Processes
      3. Threads
      4. Program Exits
      5. Interprocess Communication
      6. The multiprocessing Module
      7. Other Ways to Start Programs
      8. A Portable Program-Launch Framework
      9. Other System Tools Coverage
    5. 6. Complete System Programs
      1. “The Greps of Wrath”
      2. A Quick Game of “Find the Biggest Python File”
      3. Splitting and Joining Files
      4. Generating Redirection Web Pages
      5. A Regression Test Script
      6. Copying Directory Trees
      7. Comparing Directory Trees
      8. Searching Directory Trees
      9. Visitor: Walking Directories “++”
      10. Playing Media Files
      11. Automated Program Launchers (External)
  6. III. GUI Programming
    1. 7. Graphical User Interfaces
      1. “Here’s Looking at You, Kid”
      2. Python GUI Development Options
      3. tkinter Overview
      4. Climbing the GUI Learning Curve
      5. tkinter Coding Alternatives
      6. Adding Buttons and Callbacks
      7. Adding User-Defined Callback Handlers
      8. Adding Multiple Widgets
      9. Customizing Widgets with Classes
      10. Reusable GUI Components with Classes
      11. The End of the Tutorial
      12. Python/tkinter for Tcl/Tk Converts
    2. 8. A tkinter Tour, Part 1
      1. “Widgets and Gadgets and GUIs, Oh My!”
      2. Configuring Widget Appearance
      3. Top-Level Windows
      4. Dialogs
      5. Binding Events
      6. Message and Entry
      7. Checkbutton, Radiobutton, and Scale
      8. Running GUI Code Three Ways
      9. Images
      10. Viewing and Processing Images with PIL
    3. 9. A tkinter Tour, Part 2
      1. “On Today’s Menu: Spam, Spam, and Spam”
      2. Menus
      3. Listboxes and Scrollbars
      4. Text
      5. Canvas
      6. Grids
      7. Time Tools, Threads, and Animation
      8. The End of the Tour
    4. 10. GUI Coding Techniques
      1. “Building a Better Mousetrap”
      2. GuiMixin: Common Tool Mixin Classes
      3. GuiMaker: Automating Menus and Toolbars
      4. ShellGui: GUIs for Command-Line Tools
      5. GuiStreams: Redirecting Streams to Widgets
      6. Reloading Callback Handlers Dynamically
      7. Wrapping Up Top-Level Window Interfaces
      8. GUIs, Threads, and Queues
      9. More Ways to Add GUIs to Non-GUI Code
      10. The PyDemos and PyGadgets Launchers
    5. 11. Complete GUI Programs
      1. “Python, Open Source, and Camaros”
      2. PyEdit: A Text Editor Program/Object
      3. PyPhoto: An Image Viewer and Resizer
      4. PyView: An Image and Notes Slideshow
      5. PyDraw: Painting and Moving Graphics
      6. PyClock: An Analog/Digital Clock Widget
      7. PyToe: A Tic-Tac-Toe Game Widget
      8. Where to Go from Here
  7. IV. Internet Programming
    1. 12. Network Scripting
      1. “Tune In, Log On, and Drop Out”
      2. Python Internet Development Options
      3. Plumbing the Internet
      4. Socket Programming
      5. Handling Multiple Clients
      6. Making Sockets Look Like Files and Streams
      7. A Simple Python File Server
    2. 13. Client-Side Scripting
      1. “Socket to Me!”
      2. FTP: Transferring Files over the Net
      3. Transferring Files with ftplib
      4. Transferring Directories with ftplib
      5. Transferring Directory Trees with ftplib
      6. Processing Internet Email
      7. POP: Fetching Email
      8. SMTP: Sending Email
      9. email: Parsing and Composing Mail Content
      10. A Console-Based Email Client
      11. The mailtools Utility Package
      12. NNTP: Accessing Newsgroups
      13. HTTP: Accessing Websites
      14. The urllib Package Revisited
      15. Other Client-Side Scripting Options
    3. 14. The PyMailGUI Client
      1. “Use the Source, Luke”
      2. Major PyMailGUI Changes
      3. A PyMailGUI Demo
      4. PyMailGUI Implementation
      5. Ideas for Improvement
    4. 15. Server-Side Scripting
      1. “Oh, What a Tangled Web We Weave”
      2. What’s a Server-Side CGI Script?
      3. Running Server-Side Examples
      4. Climbing the CGI Learning Curve
      5. Saving State Information in CGI Scripts
      6. The Hello World Selector
      7. Refactoring Code for Maintainability
      8. More on HTML and URL Escapes
      9. Transferring Files to Clients and Servers
    5. 16. The PyMailCGI Server
      1. “Things to Do When Visiting Chicago”
      2. The PyMailCGI Website
      3. The Root Page
      4. Sending Mail by SMTP
      5. Reading POP Email
      6. Processing Fetched Mail
      7. Utility Modules
      8. Web Scripting Trade-Offs
  8. V. Tools and Techniques
    1. 17. Databases and Persistence
      1. “Give Me an Order of Persistence, but Hold the Pickles”
      2. Persistence Options in Python
      3. DBM Files
      4. Pickled Objects
      5. Shelve Files
      6. The ZODB Object-Oriented Database
      7. SQL Database Interfaces
      8. ORMs: Object Relational Mappers
      9. PyForm: A Persistent Object Viewer (External)
    2. 18. Data Structures
      1. “Roses Are Red, Violets Are Blue; Lists Are Mutable, and So Is Set Foo”
      2. Implementing Stacks
      3. Implementing Sets
      4. Subclassing Built-in Types
      5. Binary Search Trees
      6. Graph Searching
      7. Permuting Sequences
      8. Reversing and Sorting Sequences
      9. PyTree: A Generic Tree Object Viewer
    3. 19. Text and Language
      1. “See Jack Hack. Hack, Jack, Hack”
      2. Strategies for Processing Text in Python
      3. String Method Utilities
      4. Regular Expression Pattern Matching
      5. XML and HTML Parsing
      6. Advanced Language Tools
      7. Custom Language Parsers
      8. PyCalc: A Calculator Program/Object
    4. 20. Python/C Integration
      1. “I Am Lost at C”
      2. Extending Python in C: Overview
      3. A Simple C Extension Module
      4. The SWIG Integration Code Generator
      5. Wrapping C Environment Calls
      6. Wrapping C++ Classes with SWIG
      7. Other Extending Tools
      8. Embedding Python in C: Overview
      9. Basic Embedding Techniques
      10. Registering Callback Handler Objects
      11. Using Python Classes in C
      12. Other Integration Topics
  9. VI. The End
    1. 21. Conclusion: Python and the Development Cycle
      1. “That’s the End of the Book, Now Here’s the Meaning of Life”
      2. “Something’s Wrong with the Way We Program Computers”
      3. The “Gilligan Factor”
      4. Doing the Right Thing
      5. Enter Python
      6. But What About That Bottleneck?
      7. On Sinking the Titanic
      8. “So What’s Python?”: The Sequel
      9. In the Final Analysis…
  10. Index
  11. About the Author
  12. Colophon
  13. Copyright
O'Reilly logo

Step 2: Storing Records Persistently

So far, we’ve settled on a dictionary-based representation for our database of records, and we’ve reviewed some Python data structure concepts along the way. As mentioned, though, the objects we’ve seen so far are temporary—they live in memory and they go away as soon as we exit Python or the Python program that created them. To make our people persistent, they need to be stored in a file of some sort.

Using Formatted Files

One way to keep our data around between program runs is to write all the data out to a simple text file, in a formatted way. Provided the saving and loading tools agree on the format selected, we’re free to use any custom scheme we like.

Test data script

So that we don’t have to keep working interactively, let’s first write a script that initializes the data we are going to store (if you’ve done any Python work in the past, you know that the interactive prompt tends to become tedious once you leave the realm of simple one-liners). Example 1-1 creates the sort of records and database dictionary we’ve been working with so far, but because it is a module, we can import it repeatedly without having to retype the code each time. In a sense, this module is a database itself, but its program code format doesn’t support automatic or end-user updates as is.

Example 1-1. PP4E\Preview\

# initialize data to be stored in files, pickles, shelves

# records
bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}
sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'hdw'}
tom = {'name': 'Tom',       'age': 50, 'pay': 0,     'job': None}

# database
db = {}
db['bob'] = bob
db['sue'] = sue
db['tom'] = tom

if __name__ == '__main__':       # when run as a script
    for key in db:
        print(key, '=>\n  ', db[key])

As usual, the __name__ test at the bottom of Example 1-1 is true only when this file is run, not when it is imported. When run as a top-level script (e.g., from a command line, via an icon click, or within the IDLE GUI), the file’s self-test code under this test dumps the database’s contents to the standard output stream (remember, that’s what print function-call statements do by default).

Here is the script in action being run from a system command line on Windows. Type the following command in a Command Prompt window after a cd to the directory where the file is stored, and use a similar console window on other types of computers:

...\PP4E\Preview> python
bob =>
   {'job': 'dev', 'pay': 30000, 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'job': 'hdw', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}
tom =>
   {'job': None, 'pay': 0, 'age': 50, 'name': 'Tom'}

File name conventions

Since this is our first source file (a.k.a. “script”), here are three usage notes for this book’s examples:

  • The text ...\PP4E\Preview> in the first line of the preceding example listing stands for your operating system’s prompt, which can vary per platform; you type just the text that follows this prompt (python

  • Like all examples in this book, the system prompt also gives the directory in the downloadable book examples package where this command should be run. When running this script using a command-line in a system shell, make sure the shell’s current working directory is PP4E\Preview. This can matter for examples that use files in the working directory.

  • Similarly, the label that precedes every example file’s code listing tells you where the source file resides in the examples package. Per the Example 1-1 listing label shown earlier, this script’s full filename is PP4E\Preview\ in the examples tree.

We’ll use these conventions throughout the book; see the Preface for more on getting the examples if you wish to work along. I occasionally give more of the directory path in system prompts when it’s useful to provide the extra execution context, especially in the system part of the book (e.g., a “C:\” prefix from Windows or more directory names).

Script start-up pointers

I gave pointers for using the interactive prompt earlier. Now that we’ve started running script files, here are also a few quick startup pointers for using Python scripts in general:

  • On some platforms, you may need to type the full directory path to the Python program on your machine; if Python isn’t on your system path setting on Windows, for example, replace python in the command with C:\Python31\python (this assumes you’re using Python 3.1).

  • On most Windows systems you also don’t need to type python on the command line at all; just type the file’s name to run it, since Python is registered to open “.py” script files.

  • You can also run this file inside Python’s standard IDLE GUI (open the file and use the Run menu in the text edit window), and in similar ways from any of the available third-party Python IDEs (e.g., Komodo, Eclipse, NetBeans, and the Wing IDE).

  • If you click the program’s file icon to launch it on Windows, be sure to add an input() call to the bottom of the script to keep the output window up. On other systems, icon clicks may require a #! line at the top and executable permission via a chmod command.

I’ll assume here that you’re able to run Python code one way or another. Again, if you’re stuck, see other books such as Learning Python for the full story on launching Python programs.

Data format script

Now, all we have to do is store all of this in-memory data in a file. There are a variety of ways to accomplish this; one of the most basic is to write one piece of data at a time, with separators between each that we can use when reloading to break the data apart. Example 1-2 shows one way to code this idea.

Example 1-2. PP4E\Preview\

Save in-memory database object to a file with custom formatting;
assume 'endrec.', 'enddb.', and '=>' are not used in the data;
assume db is dict of dict;  warning: eval can be dangerous - it
runs strings as code;  could also eval() record dict all at once;
could also dbfile.write(key + '\n') vs print(key, file=dbfile);

dbfilename = 'people-file'
ENDDB  = 'enddb.'
ENDREC = 'endrec.'
RECSEP = '=>'

def storeDbase(db, dbfilename=dbfilename):
    "formatted dump of database to flat file"
    dbfile = open(dbfilename, 'w')
    for key in db:
        print(key, file=dbfile)
        for (name, value) in db[key].items():
            print(name + RECSEP + repr(value), file=dbfile)
        print(ENDREC, file=dbfile)
    print(ENDDB, file=dbfile)

def loadDbase(dbfilename=dbfilename):
    "parse data to reconstruct database"
    dbfile = open(dbfilename)
    import sys
    sys.stdin = dbfile
    db = {}
    key = input()
    while key != ENDDB:
        rec = {}
        field = input()
        while field != ENDREC:
            name, value = field.split(RECSEP)
            rec[name] = eval(value)
            field = input()
        db[key] = rec
        key = input()
    return db

if __name__ == '__main__':
    from initdata import db

This is a somewhat complex program, partly because it has both saving and loading logic and partly because it does its job the hard way; as we’ll see in a moment, there are better ways to get objects into files than by manually formatting and parsing them. For simple tasks, though, this does work; running Example 1-2 as a script writes the database out to a flat file. It has no printed output, but we can inspect the database file interactively after this script is run, either within IDLE or from a console window where you’re running these examples (as is, the database file shows up in the current working directory):

...\PP4E\Preview> python
...\PP4E\Preview> python
>>> for line in open('people-file'):
...     print(line, end='')
name=>'Bob Smith'
name=>'Sue Jones'

This file is simply our database’s content with added formatting. Its data originates from the test data initialization module we wrote in Example 1-1 because that is the module from which Example 1-2’s self-test code imports its data. In practice, Example 1-2 itself could be imported and used to store a variety of databases and files.

Notice how data to be written is formatted with the as-code repr call and is re-created with the eval call, which treats strings as Python code. That allows us to store and re-create things like the None object, but it is potentially unsafe; you shouldn’t use eval if you can’t be sure that the database won’t contain malicious code. For our purposes, however, there’s probably no cause for alarm.

Utility scripts

To test further, Example 1-3 reloads the database from a file each time it is run.

Example 1-3. PP4E\Preview\

from make_db_file import loadDbase
db = loadDbase()
for key in db:
    print(key, '=>\n  ', db[key])

And Example 1-4 makes changes by loading, updating, and storing again.

Example 1-4. PP4E\Preview\

from make_db_file import loadDbase, storeDbase
db = loadDbase()
db['sue']['pay'] *= 1.10
db['tom']['name'] = 'Tom Tom'

Here are the dump script and the update script in action at a system command line; both Sue’s pay and Tom’s name change between script runs. The main point to notice is that the data stays around after each script exits—our objects have become persistent simply because they are mapped to and from text files:

...\PP4E\Preview> python
bob =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

...\PP4E\Preview> python
...\PP4E\Preview> python
bob =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'pay': 44000.0, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom Tom'}
Sue Jones

As is, we’ll have to write Python code in scripts or at the interactive command line for each specific database update we need to perform (later in this chapter, we’ll do better by providing generalized console, GUI, and web-based interfaces instead). But at a basic level, our text file is a database of records. As we’ll learn in the next section, though, it turns out that we’ve just done a lot of pointless work.

Using Pickle Files

The formatted text file scheme of the prior section works, but it has some major limitations. For one thing, it has to read the entire database from the file just to fetch one record, and it must write the entire database back to the file after each set of updates. Although storing one record’s text per file would work around this limitation, it would also complicate the program further.

For another thing, the text file approach assumes that the data separators it writes out to the file will not appear in the data to be stored: if the characters => happen to appear in the data, for example, the scheme will fail. We might work around this by generating XML text to represent records in the text file, using Python’s XML parsing tools, which we’ll meet later in this text, to reload; XML tags would avoid collisions with actual data’s text, but creating and parsing XML would complicate the program substantially too.

Perhaps worst of all, the formatted text file scheme is already complex without being general: it is tied to the dictionary-of-dictionaries structure, and it can’t handle anything else without being greatly expanded. It would be nice if a general tool existed that could translate any sort of Python data to a format that could be saved in a file in a single step.

That is exactly what the Python pickle module is designed to do. The pickle module translates an in-memory Python object into a serialized byte stream—a string of bytes that can be written to any file-like object. The pickle module also knows how to reconstruct the original object in memory, given the serialized byte stream: we get back the exact same object. In a sense, the pickle module replaces proprietary data formats—its serialized format is general and efficient enough for any program. With pickle, there is no need to manually translate objects to data when storing them persistently, and no need to manually parse a complex format to get them back. Pickling is similar in spirit to XML representations, but it’s both more Python-specific, and much simpler to code.

The net effect is that pickling allows us to store and fetch native Python objects as they are and in a single step—we use normal Python syntax to process pickled records. Despite what it does, the pickle module is remarkably easy to use. Example 1-5 shows how to store our records in a flat file, using pickle.

Example 1-5. PP4E\Preview\

from initdata import db
import pickle
dbfile = open('people-pickle', 'wb')               # use binary mode files in 3.X
pickle.dump(db, dbfile)                            # data is bytes, not str

When run, this script stores the entire database (the dictionary of dictionaries defined in Example 1-1) to a flat file named people-pickle in the current working directory. The pickle module handles the work of converting the object to a string. Example 1-6 shows how to access the pickled database after it has been created; we simply open the file and pass its content back to pickle to remake the object from its serialized string.

Example 1-6. PP4E\Preview\

import pickle
dbfile = open('people-pickle', 'rb')               # use binary mode files in 3.X
db = pickle.load(dbfile)
for key in db:
    print(key, '=>\n  ', db[key])

Here are these two scripts at work, at the system command line again; naturally, they can also be run in IDLE, and you can open and inspect the pickle file by running the same sort of code interactively as well:

...\PP4E\Preview> python
...\PP4E\Preview> python
bob =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

Updating with a pickle file is similar to a manually formatted file, except that Python is doing all of the formatting work for us. Example 1-7 shows how.

Example 1-7. PP4E\Preview\

import pickle
dbfile = open('people-pickle', 'rb')
db = pickle.load(dbfile)

db['sue']['pay'] *= 1.10
db['tom']['name'] = 'Tom Tom'

dbfile = open('people-pickle', 'wb')
pickle.dump(db, dbfile)

Notice how the entire database is written back to the file after the records are changed in memory, just as for the manually formatted approach; this might become slow for very large databases, but we’ll ignore this for the moment. Here are our update and dump scripts in action—as in the prior section, Sue’s pay and Tom’s name change between scripts because they are written back to a file (this time, a pickle file):

...\PP4E\Preview> python
...\PP4E\Preview> python
bob =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'pay': 44000.0, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom Tom'}
Sue Jones

As we’ll learn in Chapter 17, the Python pickling system supports nearly arbitrary object types—lists, dictionaries, class instances, nested structures, and more. There, we’ll also learn about the pickler’s text and binary storage protocols; as of Python 3, all protocols use bytes objects to represent pickled data, which in turn requires pickle files to be opened in binary mode for all protocols. As we’ll see later in this chapter, the pickler and its data format also underlie shelves and ZODB databases, and pickled class instances provide both data and behavior for objects stored.

In fact, pickling is more general than these examples may imply. Because they accept any object that provides an interface compatible with files, pickling and unpickling may be used to transfer native Python objects to a variety of media. Using a network socket, for instance, allows us to ship pickled Python objects across a network and provides an alternative to larger protocols such as SOAP and XML-RPC.

Using Per-Record Pickle Files

As mentioned earlier, one potential disadvantage of this section’s examples so far is that they may become slow for very large databases: because the entire database must be loaded and rewritten to update a single record, this approach can waste time. We could improve on this by storing each record in the database in a separate flat file. The next three examples show one way to do so; Example 1-8 stores each record in its own flat file, using each record’s original key as its filename with a .pkl appended (it creates the files bob.pkl, sue.pkl, and tom.pkl in the current working directory).

Example 1-8. PP4E\Preview\

from initdata import bob, sue, tom
import pickle
for (key, record) in [('bob', bob), ('tom', tom), ('sue', sue)]:
    recfile = open(key + '.pkl', 'wb')
    pickle.dump(record, recfile)

Next, Example 1-9 dumps the entire database by using the standard library’s glob module to do filename expansion and thus collect all the files in this directory with a .pkl extension. To load a single record, we open its file and deserialize with pickle; we must load only one record file, though, not the entire database, to fetch one record.

Example 1-9. PP4E\Preview\

import pickle, glob
for filename in glob.glob('*.pkl'):         # for 'bob','sue','tom'
    recfile = open(filename, 'rb')
    record  = pickle.load(recfile)
    print(filename, '=>\n  ', record)

suefile = open('sue.pkl', 'rb')
print(pickle.load(suefile)['name'])         # fetch sue's name

Finally, Example 1-10 updates the database by fetching a record from its file, changing it in memory, and then writing it back to its pickle file. This time, we have to fetch and rewrite only a single record file, not the full database, to update.

Example 1-10. PP4E\Preview\

import pickle
suefile = open('sue.pkl', 'rb')
sue = pickle.load(suefile)

sue['pay'] *= 1.10
suefile = open('sue.pkl', 'wb')
pickle.dump(sue, suefile)

Here are our file-per-record scripts in action; the results are about the same as in the prior section, but database keys become real filenames now. In a sense, the filesystem becomes our top-level dictionary—filenames provide direct access to each record.

...\PP4E\Preview> python
...\PP4E\Preview> python
bob.pkl =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue.pkl =>
   {'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom.pkl =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

...\PP4E\Preview> python
...\PP4E\Preview> python
bob.pkl =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue.pkl =>
   {'pay': 44000.0, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom.pkl =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

Using Shelves

Pickling objects to files, as shown in the preceding section, is an optimal scheme in many applications. In fact, some applications use pickling of Python objects across network sockets as a simpler alternative to network protocols such as the SOAP and XML-RPC web services architectures (also supported by Python, but much heavier than pickle).

Moreover, assuming your filesystem can handle as many files as you’ll need, pickling one record per file also obviates the need to load and store the entire database for each update. If we really want keyed access to records, though, the Python standard library offers an even higher-level tool: shelves.

Shelves automatically pickle objects to and from a keyed-access filesystem. They behave much like dictionaries that must be opened, and they persist after each program exits. Because they give us key-based access to stored records, there is no need to manually manage one flat file per record—the shelve system automatically splits up stored records and fetches and updates only those records that are accessed and changed. In this way, shelves provide utility similar to per-record pickle files, but they are usually easier to code.

The shelve interface is just as simple as pickle: it is identical to dictionaries, with extra open and close calls. In fact, to your code, a shelve really does appear to be a persistent dictionary of persistent objects; Python does all the work of mapping its content to and from a file. For instance, Example 1-11 shows how to store our in-memory dictionary objects in a shelve for permanent keeping.

Example 1-11. PP4E\Preview\

from initdata import bob, sue
import shelve
db ='people-shelve')
db['bob'] = bob
db['sue'] = sue

This script creates one or more files in the current directory with the name people-shelve as a prefix (in Python 3.1 on Windows, people-shelve.bak, people-shelve.dat, and people-shelve.dir). You shouldn’t delete these files (they are your database!), and you should be sure to use the same base name in other scripts that access the shelve. Example 1-12, for instance, reopens the shelve and indexes it by key to fetch its stored records.

Example 1-12. PP4E\Preview\

import shelve
db ='people-shelve')
for key in db:
    print(key, '=>\n  ', db[key])

We still have a dictionary of dictionaries here, but the top-level dictionary is really a shelve mapped onto a file. Much happens when you access a shelve’s keys—it uses pickle internally to serialize and deserialize objects stored, and it interfaces with a keyed-access filesystem. From your perspective, though, it’s just a persistent dictionary. Example 1-13 shows how to code shelve updates.

Example 1-13. PP4E\Preview\

from initdata import tom
import shelve
db ='people-shelve')
sue = db['sue']                       # fetch sue
sue['pay'] *= 1.50
db['sue'] = sue                       # update sue
db['tom'] = tom                       # add a new record

Notice how this code fetches sue by key, updates in memory, and then reassigns to the key to update the shelve; this is a requirement of shelves by default, but not always of more advanced shelve-like systems such as ZODB, covered in Chapter 17. As we’ll see later, also has a newer writeback keyword argument, which, if passed True, causes all records loaded from the shelve to be cached in memory, and automatically written back to the shelve when it is closed; this avoids manual write backs on changes, but can consume memory and make closing slow.

Also note how shelve files are explicitly closed. Although we don’t need to pass mode flags to (by default it creates the shelve if needed, and opens it for reads and writes otherwise), some underlying keyed-access filesystems may require a close call in order to flush output buffers after changes.

Finally, here are the shelve-based scripts on the job, creating, changing, and fetching records. The records are still dictionaries, but the database is now a dictionary-like shelve which automatically retains its state in a file between program runs:

...\PP4E\Preview> python
...\PP4E\Preview> python
bob =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
Sue Jones

...\PP4E\Preview> python
...\PP4E\Preview> python
bob =>
   {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
   {'pay': 60000.0, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
   {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

When we ran the update and dump scripts here, we added a new record for key tom and increased Sue’s pay field by 50 percent. These changes are permanent because the record dictionaries are mapped to an external file by shelve. (In fact, this is a particularly good script for Sue—something she might consider scheduling to run often, using a cron job on Unix, or a Startup folder or msconfig entry on Windows…)

The best content for your career. Discover unlimited learning on demand for around $1/day.