Getting ready

You will need to get the metadata file from the 1,000 Genomes sequence index. Please check https://github.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-Second-Edition/blob/master/Datasets.ipynb and download the sequence.index file. If you are using Jupyter Notebook, open the Chapter01/Interfacing_R.ipynb file and just execute the wget command on top.

This file has information about all of the FASTQ files in the project (we will use data from the Human 1,000 Genomes Project in the chapters to come). This includes the FASTQ file, the sample ID, and the population of origin, and important statistical information per lane, such as the number of reads and number of DNA bases read.

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.