Studying data via stream statistics

While Kettle's forte is extracting, manipulating, and loading data, there is an entire set of tools built for generating statistics and analytic style data from the data stream. This recipe will focus on several of those tools that will allow for even more insight into your data. Kettle treats the data worked on in transformations as a stream going from an input to an output. The tools discussed in this recipe will show how to learn more about the data stream through gathering statistics about the data for analysis.

Getting ready

This recipe will not be a single large process, but made up of smaller recipes around the same subject. We will be using the Baseball salary dataset that can be found on the book's website ...

Get Pentaho Data Integration Cookbook Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.