O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Pentaho Data Integration Quick Start Guide

Book Description

Get productive quickly with Pentaho Data Integration

Key Features

  • Take away the pain of starting with a complex and powerful system
  • Simplify your data transformation and integration work
  • Explore, transform, and validate your data with Pentaho Data Integration

Book Description

Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag and drop design and powerful Extract-Transform-Load (ETL) capabilities. Given its power and flexibility, initial attempts to use the Pentaho Data Integration tool can be difficult or confusing. This book is the ideal solution.

This book reduces your learning curve with PDI. It provides the guidance needed to make you productive, covering the main features of Pentaho Data Integration. It demonstrates the interactive features of the graphical designer, and takes you through the main ETL capabilities that the tool offers.

By the end of the book, you will be able to use PDI for extracting, transforming, and loading the types of data you encounter on a daily basis.

What you will learn

  • Design, preview and run transformations in Spoon
  • Run transformations using the Pan utility
  • Understand how to obtain data from different types of files
  • Connect to a database and explore it using the database explorer
  • Understand how to transform data in a variety of ways
  • Understand how to insert data into database tables
  • Design and run jobs for sequencing tasks and sending emails
  • Combine the execution of jobs and transformations

Who this book is for

This book is for software developers, business intelligence analysts, and others involved or interested in developing ETL solutions, or more generally, doing any kind of data manipulation.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Pentaho Data Integration Quick Start Guide
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Foreword
  6. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  8. Getting Started with PDI
    1. Introducing PDI
    2. Installing PDI
    3. Configuring the graphical designer tool
    4. Creating a simple transformation
    5. Understanding the Kettle home directory
    6. Summary
  9. Getting Familiar with Spoon
    1. Exploring the Spoon interface
    2. Designing, previewing, and running transformations
      1. Designing and previewing a transformation
        1. Understanding the logging options
        2. Understanding the Step Metrics tab
      2. Dealing with errors while designing
      3. Saving and running a transformation
    3. Defining and using Kettle variables
      1. Using named parameters
    4. Running transformations with the Pan utility
    5. Summary
  10. Extracting Data
    1. Getting data from plain files
      1. Reading plain files
      2. Reading files with great versatility
      3. Reading files from remote locations
        1. Reading files from Google Drive
    2. Getting data from relational databases
      1. Connecting to a database and using the database explorer
      2. Getting data from a database
    3. Getting data from other sources
      1. XML and JSON
      2. System information and Kettle variables
    4. Combining different sources into a single dataset
      1. Manipulating the metadata
      2. Combining two different datasets into a single dataset
    5. Summary
  11. Transforming Data
    1. Transforming data in different ways
      1. Extracting data from existing fields
      2. More ways to create new fields
    2. Sorting and aggregating data
      1. Sorting data
      2. Aggregating data
    3. Filtering rows
      1. Filtering rows upon conditions
      2. Splitting the stream upon conditions
    4. Looking up for data
      1. Looking for data in a secondary stream
      2. Looking up data in a database
    5. Summary
  12. Loading Data
    1. Generating different kinds of files
    2. Inserting and updating data in database tables
      1. Inserting data
      2. Updating data
        1. Handling errors
    3. Loading a datamart
      1. Loading a time dimension
      2. Loading other kinds of dimensions
        1. Loading a dimension with a combination lookup/update step
        2. Loading a dimension with a dimension lookup/update step
      3. Loading a fact table
    4. Summary
  13. Orchestrating Your Work
    1. Understanding the purpose of PDI jobs
    2. Designing and running jobs
      1. Creating and running a simple job 
        1. Understanding the results of execution
      2. Sequencing tasks
      3. Taking a tour of the job entries
        1. Sending emails
    3. Combining the execution of jobs and transformations
      1. Executing transformations from a job
        1. Creating user-defined Kettle variables
      2. Nesting transformations and jobs
    4. Running jobs with the Kitchen utility
    5. Summary
  14. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think