You will need PLINK installed. Remember that we are not using a conda environment, so you have to make sure it is available for Airflow. We will define the following tasks:
- Downloading data
- Uncompressing it
- Sub-sampling at 10%
- Sub-sampling at 1%
- Computing PCA on the 1% sub-sample
- Charting the PCA
Our pipeline recipe will have two parts: the actual coding of the pipeline and making the pipeline actually execute.
The code for this can be found on Chapter08/pipelines/airflow/create_tasks.py.