It is truly great to be able to process your file-based data. But what happens to your data when you’re done? Of course, it’s best to save your data to a disk file, which allows you to use it again at some later date and time. Taking your memory-based data and storing it to disk is what persistence is all about. Python supports all the usual tools for writing to files and also provides some cool facilities for efficiently storing Python data. So...flip the page and let’s get started learning them.
It’s a rare program that reads data from a disk file, processes the data, and then throws away the processed data. Typically, programs save the data they process, display their output on screen, or transfer data over a network.
Before you learn what’s involved in writing data to disk, let’s process the data from the previous chapter to work out who said what to whom.
When that’s done, you’ll have something worth saving.
When you use the
open() BIF to work with a disk file, you can specify an access mode to use. By default,
open() uses mode
r for reading, so you don’t need to specify it. To open a file for writing, use mode
By default, the
print() BIF uses standard output (usually the screen) when displaying data. To write data to a file instead, use the
file argument to specify the data file object to use:
When you’re done, be sure to close the file to ensure all of your data is written to disk. This is known as flushing and is very important:
When you use access mode
w, Python opens your named file for writing. If the file already exists, it is cleared of its contents, or clobbered. To append to a file, use access mode
a, and to open a file for writing and reading (without clobbering), use
w+. If you try to open a file for writing that does not already exist, it is first created for you, and then opened for writing.
It’s a different story when writing data to files: if you need to handle an
IOError before a file is closed, your written data might become corrupted and there’s no way of telling until after it has happened.
Your exception-handling code is doing its job, but you now have a situation where your data could potentially be corrupted, which can’t be good.
What’s needed here is something that lets you run some code regardless of whether an
IOError has occured. In the context of your code, you’ll want to make sure the files are closed no matter what.
If no runtime errors occur, any code in the
finally suite executes. Equally, if an
IOError occurs, the
except suite executes and then the
finally suite runs.
No matter what, the code in the finally suite always runs.
By moving your file closing code into your
finally suite, you are reducing the possibility of data corruption errors.
This is a big improvement, because you’re now ensuring that files are closed properly (even when write errors occur).
But what about those errors?
How do you find out the specifics of the error?
It turns out that the Python interpreter knows...and it will give up the details if only you’d ask.
When an error occurs at runtime, Python raises an exception of the specific type (such as
ValueError, and so on). Additionally, Python creates an exception object that is passed as an argument to your except suite.
Let’s use IDLE to see how this works.
Of course, all this extra logic is starting to obscure the real meaning of your code.
Because the use of the try/except/finally pattern is so common when it comes to working with files, Python includes a statement that abstracts away some of the details. The
with statement, when used with files, can dramatically reduce the amount of code you have to write, because it negates the need to include a
finally suite to handle the closing of a potentially opened data file. Take a look:
When you use
with, you no longer have to worry about closing any opened files, as the Python interpreter automatically takes care of this for you. The
with code on the the right is identical in function to that on the left. At Head First Labs, we know which approach we prefer.
Yikes! It would appear your list is converted to a large string by
print() when it is saved. Your experimental code reads a single line of data from the file and gets all of the data as one large chunk of text...so much for your code saving your list data.
What are your options for dealing with this problem?
print() displays your data in a format that mimics how your list data is actually stored by the Python interpreter. The resulting output is not really meant to be processed further... its primary purpose is to show you, the Python programmer, what your list data “looks like” in memory.
Parsing the data in the file is a possibility...although it’s complicated by all those square brackets, quotes, and commas. Writing the required code is doable, but it is a lot of code just to read back in your saved data.
Of course, if the data is in a more easily parseable format, the task would likely be easier, so maybe the second option is worth considering, too?
print_lol() function from Chapter 2, which takes any list (or list of lists) and displays it on screen, one line at a time. And nested lists can be indented, if necessary.
This functionality sounds perfect! Here’s your code from the
nester.py module (last seen at the end of Chapter 2):
Amending this code to print to a disk file instead of the screen (known as standard output) should be relatively straightforward. You can then save your data in a more usable format.
Standard Output The default place where your code writes its data when the “print()” BIF is used. This is typically the screen. In Python, standard output is referred to as “sys.stdout” and is importable from the Standard Library’s “sys” module.
That’s a good point.
This problem is not unlike the problem from the beginning of the chapter, in that you’ve got lines of text in a disk file that you need to process, only now you have two files instead of one.
You know how to write the code to process your new files, but writing custom code like this is specific to the format that you’ve created for this problem. This is brittle: if the data format changes, your custom code will have to change, too.
Ask yourself: is it worth it?
Once you pickle your data to a file, it is persistent and ready to be read into another program at some later date/time:
You can, for example, store your pickled data on disk, put it in a database, or transfer it over a network to another computer.
When you are ready, reversing this process unpickles your persistent pickled data and recreates your data in its original form within Python’s memory:
pickle is straightforward: import the required module, then use
dump() to save your data and, some time later,
load() to restore it. The only requirement when working with pickled files is that they have to be opened in binary access mode:
Python takes care of your file I/O details, so you can concentrate on what your code actually does or needs to do.
As you’ve seen, being able to work with, save, and restore data in lists is a breeze, thanks to Python. But what other data structures does Python support out of the box?
Let’s dive into Chapter 5 to find out.
You’ve got Chapter 4 under your belt and you’ve added some key Python techiques to your toolbox.