O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building a Data Mart with Pentaho Data Integration

Video Description

A step-by-step tutorial that takes you through the creation of an ETL process to populate a Kimball-style star schema

About This Video

  • Learn how to create ETL transformations to populate a star schema in a short span of time

  • Create a fully-functional ETL process using a practical approach

  • Follow the step-by-step instructions for creating an ETL based on a fictional company get your hands dirty and learn fast

  • In Detail

    Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.

    Building a Data Mart with Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.

    Building a Data Mart with Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.

    By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.

    Table of Contents

    1. Chapter 1 : Getting Started
      1. The Second-hand Lens Store 00:06:49
      2. The Derived Star Schema 00:04:30
      3. Setting up Our Development Environment 00:07:07
    2. Chapter 2 : Agile BI – Creating ETLs to Prepare Joined Data Set
      1. Importing Raw Data 00:03:23
      2. Exporting Data Using the Standard Table Output 00:04:33
      3. Exporting Data Using the Dedicated Bulk Loading 00:04:32
    3. Chapter 3 : Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL Improvements
      1. Creating a Pentaho Analysis Model 00:03:26
      2. Analyzing Data Using Pentaho Analyzer 00:03:50
      3. Improving Your ETL for Better Data Quality 00:04:15
    4. Chpater 4 : Slowly Changing Dimensions
      1. Creating a Slowly Changing Dimension of Type 1 Using Insert/Update 00:06:48
      2. Creating a Slowly Changing Dimension of Type 1 Using Dimension Lookup Update 00:04:59
      3. Creating a Slowly Changing Dimension Type 2 00:05:18
    5. Chapter 5 : Populating Data Dimension
      1. Defining Start and End Date Parameters 00:05:17
      2. Auto-generating Daily Rows for a Given Period 00:04:26
      3. Auto-generating Year, Month, and Day 00:06:27
    6. Chapter 6 : Creating the Fact Transformation
      1. Sourcing Raw Data for Fact Table 00:03:53
      2. Lookup Slowly Changing Dimension of the Type 1 Key 00:04:29
      3. Lookup Slowly Changing Dimension of the Type 2 key 00:06:08
    7. Chapter 7 : Orchestration
      1. Loading Dimensions in Parallel 00:06:20
      2. Creating Master Jobs 00:04:10
    8. Chapter 8 : ID-based Change Data Capture
      1. Implementing Change Data Capture (CDC) 00:04:59
      2. Creating a CDC Job Flow 00:04:49
    9. Chapter 9 : Final Touches: Logging and Scheduling
      1. Setting up a Dedicated DB Schema 00:01:23
      2. Setting up Built-in Logging 00:04:22
      3. Scheduling on the Command Line 00:05:30