We will revisit the Anopheles gambiae dataset that we used in previous chapters. There is a HDF5 version of the VCF file that we used previously. You can download chromosome arm 3L from ftp://ngs.sanger.ac.uk/production/ag1000g/phase1/AR3/variation/main/hdf5/ag1000g.phase1.ar3.pass.3L.h5. Remember that we are dealing with a VCF representation of 765 mosquitoes that can be carriers of Plasmodium falciparum, the parasite responsible for malaria.
The file is 19 GB in size, so I recommend installing a tool such as HDF Compass at (https://support.hdfgroup.org/projects/compass/, available on Debian/Ubuntu Linux with apt-get install hdf-compass) to graphically inspect the file before proceeding. HDF5 is mostly a key-value store, where ...