How to do it...

Take a look at the following steps:

  1. Let's load the metadata (we will use a simplified version from the previous recipe) as follows:
from collections import defaultdictf = open('relationships_w_pops_121708.txt')pop_ind = defaultdict(list)f.readline() # headerfor line in f:    toks = line.rstrip().split('\t')    fam_id = toks[0]    ind_id = toks[1]    pop = toks[-1]    pop_ind[pop].append((fam_id, ind_id))f.close()
  1. Let's check for consistency between the PLINK data file and the metadata, as we will need to clean up population mappings to generate a Genepop file, as shown in the following code:
all_inds = []for inds in pop_ind.values():    all_inds.extend(inds)for line in open('hapmap1.ped'): toks = line.rstrip().replace(' ', '\t').split('\t') ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.