Filtering data based on row numbers

Until now you've been filtering upon conditions on the values of the fields. You could also filter rows based on the row numbers. There are a couple of steps that allow us to do that. Here is a brief summary of them:

Step

Description

Sample rows (Statistics category)

This step samples the rows based on a list of row numbers or row number ranges. For example, 1,5,10..20 will filter row 1, row 5, and all the rows from 10 up to 20 (10 and 20 included).

Reservoir Sampling (Statistics category)

This step allows you to sample a fixed number of rows. The step uses uniform sampling, which means that all incoming rows have an equal chance of being selected.

Top / Bottom / First / Last filter ...

Get Learning Pentaho Data Integration 8 CE - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.