You are previewing Python and HDF5.
O'Reilly logo
Python and HDF5

Book Description

Gain hands-on experience with HDF5 for storing scientific data in Python. This practical guide quickly gets you up to speed on the details, best practices, and pitfalls of using HDF5 to archive and share numerical datasets ranging in size from gigabytes to terabytes. Through real-world examples and practical exercises, you’ll explore topics such as scientific datasets, hierarchically organized groups, user-defined metadata, and interoperable files.

Table of Contents

  1. Special Upgrade Offer
  2. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. Safari® Books Online
    4. How to Contact Us
    5. Acknowledgments
  3. 1. Introduction
    1. Python and HDF5
      1. Organizing Data and Metadata
      2. Coping with Large Data Volumes
    2. What Exactly Is HDF5?
      1. HDF5: The File
      2. HDF5: The Library
      3. HDF5: The Ecosystem
  4. 2. Getting Started
    1. HDF5 Basics
    2. Setting Up
      1. Python 2 or Python 3?
      2. Code Examples
      3. NumPy
      4. HDF5 and h5py
      5. IPython
      6. Timing and Optimization
    3. The HDF5 Tools
      1. HDFView
      2. ViTables
      3. Command Line Tools
    4. Your First HDF5 File
      1. Use as a Context Manager
      2. File Drivers
        1. core driver
        2. family driver
        3. mpio driver
      3. The User Block
  5. 3. Working with Datasets
    1. Dataset Basics
      1. Type and Shape
      2. Reading and Writing
      3. Creating Empty Datasets
      4. Saving Space with Explicit Storage Types
      5. Automatic Type Conversion and Direct Reads
      6. Reading with astype
      7. Reshaping an Existing Array
      8. Fill Values
    2. Reading and Writing Data
      1. Using Slicing Effectively
      2. Start-Stop-Step Indexing
      3. Multidimensional and Scalar Slicing
      4. Boolean Indexing
      5. Coordinate Lists
      6. Automatic Broadcasting
      7. Reading Directly into an Existing Array
      8. A Note on Data Types
    3. Resizing Datasets
      1. Creating Resizable Datasets
      2. Data Shuffling with resize
      3. When and How to Use resize
  6. 4. How Chunking and Compression Can Help You
    1. Contiguous Storage
    2. Chunked Storage
    3. Setting the Chunk Shape
      1. Auto-Chunking
      2. Manually Picking a Shape
    4. Performance Example: Resizable Datasets
    5. Filters and Compression
      1. The Filter Pipeline
      2. Compression Filters
      3. GZIP/DEFLATE Compression
      4. SZIP Compression
      5. LZF Compression
      6. Performance
    6. Other Filters
      1. SHUFFLE Filter
      2. FLETCHER32 Filter
    7. Third-Party Filters
  7. 5. Groups, Links, and Iteration: The “H” in HDF5
    1. The Root Group and Subgroups
    2. Group Basics
      1. Dictionary-Style Access
      2. Special Properties
    3. Working with Links
      1. Hard Links
      2. Free Space and Repacking
      3. Soft Links
      4. External Links
      5. A Note on Object Names
      6. Using get to Determine Object Types
      7. Using require to Simplify Your Application
    4. Iteration and Containership
      1. How Groups Are Actually Stored
      2. Dictionary-Style Iteration
      3. Containership Testing
    5. Multilevel Iteration with the Visitor Pattern
      1. Visit by Name
      2. Multiple Links and visit
      3. Visiting Items
      4. Canceling Iteration: A Simple Search Mechanism
    6. Copying Objects
      1. Single-File Copying
    7. Object Comparison and Hashing
  8. 6. Storing Metadata with Attributes
    1. Attribute Basics
      1. Type Guessing
      2. Strings and File Compatibility
      3. Python Objects
      4. Explicit Typing
    2. Real-World Example: Accelerator Particle Database
      1. Application Format on Top of HDF5
      2. Analyzing the Data
  9. 7. More About Types
    1. The HDF5 Type System
    2. Integers and Floats
    3. Fixed-Length Strings
    4. Variable-Length Strings
      1. The vlen String Data Type
      2. Working with vlen String Datasets
      3. Byte Versus Unicode Strings
      4. Using Unicode Strings
      5. Don’t Store Binary Data in Strings!
      6. Future-Proofing Your Python 2 Application
    5. Compound Types
    6. Complex Numbers
    7. Enumerated Types
    8. Booleans
    9. The array Type
    10. Opaque Types
    11. Dates and Times
  10. 8. Organizing Data with References, Types, and Dimension Scales
    1. Object References
      1. Creating and Resolving References
      2. References as “Unbreakable” Links
      3. References as Data
    2. Region References
      1. Creating Region References and Reading
      2. Fancy Indexing
      3. Finding Datasets with Region References
    3. Named Types
      1. The Datatype Object
      2. Linking to Named Types
      3. Managing Named Types
    4. Dimension Scales
      1. Creating Dimension Scales
      2. Attaching Scales to a Dataset
  11. 9. Concurrency: Parallel HDF5, Threading, and Multiprocessing
    1. Python Parallel Basics
    2. Threading
    3. Multiprocessing
    4. MPI and Parallel HDF5
      1. A Very Quick Introduction to MPI
      2. MPI-Based HDF5 Program
      3. Collective Versus Independent Operations
      4. Atomicity Gotchas
  12. 10. Next Steps
    1. Asking for Help
    2. Contributing
  13. Index
  14. About the Author
  15. Colophon
  16. Special Upgrade Offer
  17. Copyright