Comparing the NumPy .npy binary format and pickling pandas DataFrames

Saving data in the CSV format is fine most of the time. It is easy to exchange CSV files, since most programming languages and applications can handle this format. However, it is not very efficient; CSV and other plaintext formats take up a lot of space. Numerous file formats have been invented, which offer a high level of compression such as zip, bzip, and gzip.

The following is the complete code for this storage comparison exercise, which can also be found in the binary_formats.py file of this book's code bundle:

import numpy as np import pandas as pd from tempfile import NamedTemporaryFile from os.path import getsize np.random.seed(42) a = np.random.randn(365, 4) tmpf = NamedTemporaryFile() ...

Get Python Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.