O'Reilly logo

Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration by Jos van Dongen, Roland Bouman, Matt Casters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 16. Parallelization, Clustering, and Partitioning

When you have a lot of data to process it's important to be able to use all the computing resources available to you. Whether you have a single personal computer or hundreds of large servers at your disposal you want to make Kettle use all available resources to get results in an acceptable timeframe.

In this chapter, we unravel the secrets behind making your transformations and jobs scale up and out. Scaling up is using the most of a single server with multiple CPU cores. Scaling out is using the resources of multiple machines and have them operate in parallel. Both these approaches are part of ETL subsystem #31, the Parallelizing/Pipelining System.

The first part of this chapter deals with the parallelism inside a transformation and the various ways to make use of it to make it scale up. Then we explain how to make your transformations scale out on a cluster of slave servers.

Finally we cover the finer points of Kettle partitioning and how it can help you parallelize your work even further.

Multi-Threading

In Chapter 2, we explained that the basic building block of a transformation is the step. We also explained that each step is executed in parallel. Now we'll go a bit deeper into this subject by explaining how the Kettle multi-threading capabilities allow you to take full advantage of all the processing resources in your machine to scale up a transformation.

By default, each step in a transformation is executed in parallel ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required