Storing data with PyTables

Hierarchical Data Format (HDF) is a specification and technology for the storage of big numerical data. HDF was created in the supercomputing community and is now an open standard. The latest version of HDF is HDF5 and is the one we will be using. HDF5 structures data in groups and datasets. Datasets are multidimensional homogeneous arrays. Groups can contain other groups or datasets. Groups are like directories in a hierarchical filesystem.

The two main HDF5 Python libraries are:

  • h5y
  • PyTables

In this example, we will be using PyTables. PyTables has a number of dependencies:

  • NumPy: We installed NumPy in Chapter 1, Getting Started with Python Libraries
  • numexpr: This package claims that it evaluates multiple-operator array ...

Get Python Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.