Getting ready

You will need to run the first recipe in order to make use of the hapmap10_auto_noofs_ld_12 PLINK file (with alleles recoded as 1 and 2). PCA requires LD-pruned markers; we will not risk using the offspring here because it will probably bias the result. We will use the recoded PLINK file with alleles as 1 and 2 because this makes processing easier with SmartPCA and scikit-learn.

As with the second recipe, if you are not using Docker, you will also be using some of the code that I have produced. You can find this code at https://github.com/tiagoantao/pygenomics. You can install it with the following command:

pip install pygenomics

For this recipe, you will need to download EIGENSOFT (http://www.hsph.harvard.edu/alkes-price/software/ ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.