How to do it...

Take a look at the following steps:

  1. We will start by importing the necessary libraries. We will be using h5py to read the file. We will then access the data using the keys:
from math import ceilimport numpy as npimport h5pyh5_3L = h5py.File('ag1000g.phase1.ar3.pass.3L.h5', 'r')samples = h5_3L['/3L/samples']calldata_genotype = h5_3L['/3L/calldata/genotype']positions = h5_3L['/3L/variants/POS']alt_alleles = h5_3L['/3L/variants/ALT']is_snp = h5_3L['/3L/variants/is_snp']num_samples = len(samples)

There are alternatives to h5py, but be careful as they might impose constraints on keys and data (for instance, the read methods of pandas might do this). While we are referring to the objects, they are not being loaded in memory.

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.