Distributing rows

As said, when you split a stream, you can either copy or distribute the rows. Copying is about creating copies of the whole dataset and sending each of them to each output stream. Distributing means that the rows of the dataset are distributed among the destination steps. Those steps run in separate threads, so distribution is a way to implement parallel processing.

When you distribute, the destination steps receive the rows in a round-robin fashion. For example, if you have three target steps, as for example, the three calculators in the following screenshot the first row of data goes to the first target step, the second row goes to the second step, the third row goes to the third step, the fourth row goes to the fourth ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.