Chapter 15. Performance Tuning

This chapter provides an in-depth look at the art of performance tuning Kettle. We primarily focus on tuning transformations and briefly look at what can go wrong with the performance in a job.

For readers who are interested in the internals of the transformation engine, the first part of this chapter offers many details with a number of examples. Once you have learned how the transformation engine works, we focus on how to identify performance bottlenecks. Then we offer advice on how to improve the performance of your transformations and jobs.

Note

Readers who are new to Kettle may prefer to skip this chapter until they encounter a performance problem. At that point, you can simply turn to this chapter to learn how to identify and solve the problems you're encountering.

Transformation Performance: Finding the Weakest Link

Performance tuning of a transformation is conceptually quite simple. As in any other network, you search for the weakest link. In the case of a transformation, you search for the step that is causing the performance of the transformation to be sub-optimal. To better understand why this is important, take a look at a simple example. The following transformation reads customer data from one database and writes it into another, as shown in Figure 15-1. The figure also shows the step performance metrics during execution at the bottom.

Reading and writing customer data

Figure 15.1. Reading ...

Get Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.