Getting ready

You will need PLINK installed. Remember that we are not using a conda environment, so you have to make sure it is available for Airflow. We will define the following tasks:

  1. Downloading data
  2. Uncompressing it
  3. Sub-sampling at 10%
  4. Sub-sampling at 1%
  5. Computing PCA on the 1% sub-sample
  6. Charting the PCA

Our pipeline recipe will have two parts: the actual coding of the pipeline and making the pipeline actually execute.

The code for this can be found on Chapter08/pipelines/airflow/create_tasks.py.

Get Bioinformatics with Python Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.