Take a look at the following steps:
- We will start by doing the necessary imports and checking Dask's version:
from multiprocessing.pool import Poolfrom math import ceilimport numpy as npimport h5pyimport daskimport dask.array as daimport dask.multiprocessingprint(dask.__version__)
Make sure that Dask is at least version 0.19.2, as we will be using fairly recent features.
- Now, load some HDF5 data for processing:
h5_3L = h5py.File('ag1000g.phase1.ar3.pass.3L.h5', 'r')samples = h5_3L['/3L/samples']positions = h5_3L['/3L/variants/POS']num_samples = len(samples)del samples
While this recipe is a Dask version of the previous one, there will be slight differences imposed by the Dask programming model. At this stage, notice that ...