How to do it...

Take a look at the following steps:

  1. We will start by setting up a reader for our file. Remember that this file has already been supplied to you, and should be in your current work directory:
from collections import defaultdictimport reimport HTSeqlct_bed = HTSeq.BED_Reader('LCT.bed')
  1. We are now going to extract all the types of features via its name:
feature_types = defaultdict(int)for rec in lct_bed:    last_rec = rec    feature_types[re.search('([A-Z]+)', rec.name).group(0)] += 1print(feature_types)

Remember that this code is specific to our example. You will have to adapt it to your case.

You will find that the preceding code uses a regular expression. Be careful with regular expressions, as they tend to generate read-only ...

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.