There's more...

The whole issue of filtering SNPs and other genome features will need a book on its own. This approach will depend on the type of sequencing data that you have, the number of samples, and potential extra information (for example, a pedigree among samples).

This recipe is very complex as it is, but parts of it are profoundly naive (there is a limit regarding the complexity that I can force on you in a simple recipe). For example, the window code does not support overlapping windows. Also, data structures are simplistic. However, I hope that they give you an idea of the general strategy to process genomic, high-throughput sequencing data. You can read more in Chapter 11, Advanced NGS Processing.

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.