Asynchronous non-blocking IO is extremely valuable for maximizing throughput. Without it, a stream processor will block and do nothing until an external call has completed. This recipe demonstrates how to use a parallel step to control the number of concurrent calls that can execute. As an example of the impact this can have, I once had a script that read from S3 and would take well over an hour to process, but once I added a parallel step with a setting of 16, the script executed in just five minutes. The improvement was so significant that Datadog contacted me, almost immediately, to see if we had a runaway process.
To allow concurrent calls, we simply add a parallel step to the pipeline after an external call step and specify ...