Chapter 8. Data Has to Go Somewhere

It is a capital mistake to theorize before one has data.

Arthur Conan Doyle

An active program accesses data that is stored in Random Access Memory, or RAM. RAM is very fast, but it is expensive and requires a constant supply of power; if the power goes out, all the data in memory is lost. Disk drives are slower than RAM but have more capacity, cost less, and retain data even after someone trips over the power cord. Thus, a huge amount of effort in computer systems has been devoted to making the best tradeoffs between storing data on disk and RAM. As programmers, we need persistence: storing and retrieving data using nonvolatile media such as disks.

This chapter is all about the different flavors of data storage, each optimized for different purposes: flat files, structured files, and databases. File operations other than input and output are covered in “Files”.

Note

This is also the first chapter to show examples of nonstandard Python modules; that is, Python code apart from the standard library. You’ll install them by using the pip command, which is painless. There are more details on its usage in Appendix D.

File Input/Output

The simplest kind of persistence is a plain old file, sometimes called a flat file. This is just a sequence of bytes stored under a filename. You read from a file into memory and write from memory to a file. Python makes these jobs easy. Its file operations were modeled on the familiar and popular Unix equivalents. ...

Get Introducing Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.