What you need for this book

PDI is written in Java. Any operating system that can run JVM 1.5 or higher should be able to run PDI. Some of the recipes will require other software, as listed:

  • Hortonworks Sandbox: This is Hadoop in a box, a great environment to learn how to work with NoSQL solutions without having to install everything.
  • Web Server with ASP support: This is needed for two recipes to show how to work with web services.
  • DataCleaner: This is one of the top open source data profiling tools and integrates with Kettle.
  • MySQL: All the relational database recipes have scripts for MySQL provided. Feel free to use another relational database for those recipes.

In addition, it's recommended to have access to Excel or Calc and a decent text editor ...

Get Pentaho Data Integration Cookbook Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.