Take a look at the following steps:
- We will start by importing the necessary libraries. We will be using h5py to read the file. We will then access the data using the keys:
from math import ceilimport numpy as npimport h5pyh5_3L = h5py.File('ag1000g.phase1.ar3.pass.3L.h5', 'r')samples = h5_3L['/3L/samples']calldata_genotype = h5_3L['/3L/calldata/genotype']positions = h5_3L['/3L/variants/POS']alt_alleles = h5_3L['/3L/variants/ALT']is_snp = h5_3L['/3L/variants/is_snp']num_samples = len(samples)
There are alternatives to h5py, but be careful as they might impose constraints on keys and data (for instance, the read methods of pandas might do this). While we are referring to the objects, they are not being loaded in memory.