Getting ready

We will revisit the Anopheles gambiae dataset that we used in previous chapters. There is a HDF5 version of the VCF file that we used previously. You can download chromosome arm 3L from ftp://ngs.sanger.ac.uk/production/ag1000g/phase1/AR3/variation/main/hdf5/ag1000g.phase1.ar3.pass.3L.h5. Remember that we are dealing with a VCF representation of 765 mosquitoes that can be carriers of Plasmodium falciparum, the parasite responsible for malaria.

The file is 19 GB in size, so I recommend installing a tool such as HDF Compass at (https://support.hdfgroup.org/projects/compass/, available on Debian/Ubuntu Linux with apt-get install hdf-compass) to graphically inspect the file before proceeding. HDF5 is mostly a key-value store, where ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.