Manipulating large heterogeneous tables with HDF5 and PyTables

PyTables can store homogeneous blocks of data as NumPy-like arrays in HDF5 files. It can also store heterogeneous tables, as we will see in this recipe.

Getting ready

You need PyTables for this recipe (see the previous recipe for installation instructions).

How to do it...

  1. Let's import NumPy and PyTables:
    In [1]: import numpy as np
            import tables as tb
  2. Let's create a new HDF5 file:
    In [2]: f = tb.open_file('myfile.h5', 'w')
  3. We will create an HDF5 table with two columns: the name of a city (a string with 64 characters at most), and its population (a 32-bit integer). We can specify the columns by creating a complex data type with NumPy:
    In [3]: dtype = np.dtype([('city','S64'), ('population', ...

Get IPython Interactive Computing and Visualization Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.