How to do it…

  1. We start, as usual, with the necessary imports:
import gzipimport pickleimport randomimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdfrom pandas.plotting import scatter_matrix%matplotlib inline
  1. Then we load the data. We will use pandas to navigate it:
fit = np.load(gzip.open('balanced_fit.npy.gz', 'rb'))ordered_features = np.load(open('ordered_features', 'rb'))num_features = len(ordered_features)fit_df = pd.DataFrame(fit, columns=ordered_features + ['pos', 'error'])num_samples = 80del fit
  1. Let's ask pandas to show an histogram of all annotations:
fig,ax = plt.subplots(figsize=(16,9))fit_df.hist(column=ordered_features, ax=ax)

The following histogram is generated:

Histogram of all annotations for ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.