Getting ready

Our simple example will use data from the region where the LCT gene is located in the human genome. The LCT gene codifies lactase, an enzyme involved in the digestion of lactose.

We will take this information from Ensembl. Go to http://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000115850 and choose Export data. The Output format should be BED Format. Gene information should be selected (you can choose more if you want). For convenience, a downloaded file is available in the Chapter02 directory, called LCT.bed.

The Notebook for this code is called Chapter02/Processing_BED_with_HTSeq.ipynb.

Take a look at the file before we start. An example of a few lines of this file is as follows:

track name=gene description="Gene ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.