How to do it...

Take a look at the following steps:

  1. Let's start analyzing the gene data. For simplicity, we will only use the data from two other species of the genus Ebola virus that are available in the extended dataset: the Reston virus (RESTV) and the Sudan virus (SUDV):
import osfrom collections import OrderedDictimport dendropyfrom dendropy.calculate import popgenstatgenes_species = OrderedDict()my_species = ['RESTV', 'SUDV']my_genes = ['NP', 'L', 'VP35', 'VP40']for name in my_genes:    gene_name = name.split('.')[0]    char_mat = dendropy.DnaCharacterMatrix.get_from_path('%s_align.fasta' % name, 'fasta')    genes_species[gene_name] = {}        for species in my_species:        genes_species[gene_name][species] = dendropy.DnaCharacterMatrix() for taxon, ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.