Summarizing the PDI steps that operate on sets of rows

Sorting and grouping a dataset are just two of several operations that you can apply to the set of rows as a whole, rather than to single rows. The following table gives you an overview of the main PDI steps that fall into this particular group of steps:

Step

Purpose

Group by

Builds aggregates as Sum, Maximum, and so on, on groups of rows.

Memory Group by

Same as Group by, but doesn't require sorted input.

Analytic Query

Computes lead, lag, first, and last fields over a sorted dataset.

Univariate Statistics

Computes some simple statistics. It complements the Group by step. It is much simpler to configure, but has less capabilities than that step.

Split ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.