It is truly great to be able to process your file-based data. But what happens to your data when you’re done? Of course, it’s best to save your data to a disk file, which allows you to use it again at some later date and time. Taking your memory-based data and storing it to disk is what persistence is all about. Python supports all the usual tools for writing to files and also provides some cool facilities for efficiently storing Python data. So...flip the page and let’s get started learning them.
It’s a rare program that reads data from a disk file, processes the data, and then throws away the processed data. Typically, programs save the data they process, display their output on screen, or transfer data over a network.
Before you learn what’s involved in writing data to disk, let’s process the data from the previous chapter to work out who said what to whom.
When that’s done, you’ll have something worth saving.
When you use the open()
BIF to work with a disk file, you can specify an access mode to use. By default, open()
uses mode r
for reading, so you don’t need to specify it. To open a file for writing, use mode w
:
By default, the print()
BIF uses standard output (usually the screen) when displaying data. To write data to a file instead, use the file
argument to specify the data file object to use:
When you’re done, be sure to close the file to ensure all of your data is written to disk. This is known as flushing and is very important:
Geek Bits
When you use access mode w
, Python opens your named file for writing. If the file already exists, it is cleared of its contents, or clobbered. To append to a file, use access mode a
, and to open a file for writing and reading (without clobbering), use w+
. If you try to open a file for writing that does not already exist, it is first created for you, and then opened for writing.
When all you ever do is read data from files, getting an IOError
is annoying, but rarely dangerous, because your data is still in your file, even though you might be having trouble getting at it.
It’s a different story when writing data to files: if you need to handle an IOError
before a file is closed, your written data might become corrupted and there’s no way of telling until after it has happened.
Your exception-handling code is doing its job, but you now have a situation where your data could potentially be corrupted, which can’t be good.
What’s needed here is something that lets you run some code regardless of whether an IOError
has occured. In the context of your code, you’ll want to make sure the files are closed no matter what.
When you have a situation where code must always run no matter what errors occur, add that code to your try
statement’s finally
suite:
If no runtime errors occur, any code in the finally
suite executes. Equally, if an IOError
occurs, the except
suite executes and then the finally
suite runs.
No matter what, the code in the finally suite always runs.
By moving your file closing code into your finally
suite, you are reducing the possibility of data corruption errors.
This is a big improvement, because you’re now ensuring that files are closed properly (even when write errors occur).
But what about those errors?
How do you find out the specifics of the error?
When a file I/O error occurs, your code displays a generic “File Error” message. This is too generic. How do you know what actually happened?
Who knows?
It turns out that the Python interpreter knows...and it will give up the details if only you’d ask.
When an error occurs at runtime, Python raises an exception of the specific type (such as IOError
, ValueError
, and so on). Additionally, Python creates an exception object that is passed as an argument to your except suite.
Let’s use IDLE to see how this works.
Of course, all this extra logic is starting to obscure the real meaning of your code.
Because the use of the try/except/finally pattern is so common when it comes to working with files, Python includes a statement that abstracts away some of the details. The with
statement, when used with files, can dramatically reduce the amount of code you have to write, because it negates the need to include a finally
suite to handle the closing of a potentially opened data file. Take a look:
When you use with
, you no longer have to worry about closing any opened files, as the Python interpreter automatically takes care of this for you. The with
code on the the right is identical in function to that on the left. At Head First Labs, we know which approach we prefer.
Although your data is now stored in a file, it’s not really in a useful format. Let’s experiment in the IDLE shell to see what impact this can have.
Yikes! It would appear your list is converted to a large string by print()
when it is saved. Your experimental code reads a single line of data from the file and gets all of the data as one large chunk of text...so much for your code saving your list data.
What are your options for dealing with this problem?
Geek Bits
By default, print()
displays your data in a format that mimics how your list data is actually stored by the Python interpreter. The resulting output is not really meant to be processed further... its primary purpose is to show you, the Python programmer, what your list data “looks like” in memory.
Parsing the data in the file is a possibility...although it’s complicated by all those square brackets, quotes, and commas. Writing the required code is doable, but it is a lot of code just to read back in your saved data.
Of course, if the data is in a more easily parseable format, the task would likely be easier, so maybe the second option is worth considering, too?
Recall your print_lol()
function from Chapter 2, which takes any list (or list of lists) and displays it on screen, one line at a time. And nested lists can be indented, if necessary.
This functionality sounds perfect! Here’s your code from the nester.py
module (last seen at the end of Chapter 2):
Amending this code to print to a disk file instead of the screen (known as standard output) should be relatively straightforward. You can then save your data in a more usable format.
The Scholar’s Corner
Standard Output The default place where your code writes its data when the “print()” BIF is used. This is typically the screen. In Python, standard output is referred to as “sys.stdout” and is importable from the Standard Library’s “sys” module.
That’s a good point.
This problem is not unlike the problem from the beginning of the chapter, in that you’ve got lines of text in a disk file that you need to process, only now you have two files instead of one.
You know how to write the code to process your new files, but writing custom code like this is specific to the format that you’ve created for this problem. This is brittle: if the data format changes, your custom code will have to change, too.
Ask yourself: is it worth it?
Python ships with a standard library called pickle
, which can save and load almost any Python data object, including lists.
Once you pickle your data to a file, it is persistent and ready to be read into another program at some later date/time:
You can, for example, store your pickled data on disk, put it in a database, or transfer it over a network to another computer.
When you are ready, reversing this process unpickles your persistent pickled data and recreates your data in its original form within Python’s memory:
Using pickle
is straightforward: import the required module, then use dump()
to save your data and, some time later, load()
to restore it. The only requirement when working with pickled files is that they have to be opened in binary access mode:
Python takes care of your file I/O details, so you can concentrate on what your code actually does or needs to do.
As you’ve seen, being able to work with, save, and restore data in lists is a breeze, thanks to Python. But what other data structures does Python support out of the box?
Let’s dive into Chapter 5 to find out.
You’ve got Chapter 4 under your belt and you’ve added some key Python techiques to your toolbox.
Get Head First Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.