You are previewing Pentaho® Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL®.
O'Reilly logo
Pentaho® Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL®

Book Description

Your all-in-one resource for using Pentaho with MySQL for Business Intelligence and Data Warehousing

Open-source Pentaho provides business intelligence (BI) and data warehousing solutions at a fraction of the cost of proprietary solutions. Now you can take advantage of Pentaho for your business needs with this practical guide written by two major participants in the Pentaho community.

The book covers all components of the Pentaho BI Suite. You'll learn to install, use, and maintain Pentaho-and find plenty of background discussion that will bring you thoroughly up to speed on BI and Pentaho concepts.

  • Of all available open source BI products, Pentaho offers the most comprehensive toolset and is the fastest growing open source product suite

  • Explains how to build and load a data warehouse with Pentaho Kettle for data integration/ETL, manually create JFree (pentaho reporting services) reports using direct SQL queries, and create Mondrian (Pentaho analysis services) cubes and attach them to a JPivot cube browser

  • Review deploying reports, cubes and metadata to the Pentaho platform in order to distribute BI solutions to end-users

  • Shows how to set up scheduling, subscription and automatic distribution

The companion Web site provides complete source code examples, sample data, and links to related resources.

Table of Contents

  1. Copyright
  2. About the Author
  3. Credits
  4. Acknowledgments
  5. Introduction
    1. About This Book
      1. Who Should Read This Book
      2. What You Will Need to Use This Book
      3. What You Will Learn from This Book
    2. How This Book Is Organized
      1. Part I: Getting Started with Pentaho
        1. Chapter 1: Quick Start: Pentaho Examples
        2. Chapter 2: Prerequisites
        3. Chapter 3: Server Installation and Configuration
        4. Chapter 4: The Pentaho BI Stack
      2. Part II: Dimensional Modeling and Data Warehouse Design
        1. Chapter 5: Example Business Case: World Class Movies
        2. Chapter 6: Data Warehouse Primer
        3. Chapter 7: Modeling the Business Using Star Schemas
        4. Chapter 8: The Data Mart Design Process
      3. Part III: ETL and Data Integration
        1. Chapter 9: Pentaho Data Integration Primer
        2. Chapter 10: Designing Pentaho Data Integration Solutions
        3. Chapter 11: Deploying Pentaho Data Integration Solutions
      4. Part IV: Business Intelligence Applications
        1. Chapter 12: The Metadata Layer
        2. Chapter 13: Using the Pentaho Reporting Tools
        3. Chapter 14: Scheduling, Subscription, and Bursting
        4. Chapter 15: OLAP Solutions Using Pentaho Analysis Services
        5. Chapter 16: Data Mining with Weka
        6. Chapter 17: Building Dashboards
    3. On the Website
    4. Further Resources
  6. I. Getting Started with Pentaho
    1. 1. Quick Start: Pentaho Examples
      1. 1.1. Getting Started with Pentaho
        1. 1.1.1. Downloading and Installing the Software
        2. 1.1.2. Running the Software
          1. 1.1.2.1. Starting the Pentaho BI Server
          2. 1.1.2.2. Logging in
          3. 1.1.2.3. Mantle, the Pentaho User Console
      2. 1.2. Working with the Examples
        1. 1.2.1. Using the Repository Browser
        2. 1.2.2. Understanding the Examples
      3. 1.3. Running the Examples
        1. 1.3.1. Reporting Examples
          1. 1.3.1.1. BI Developer Examples: Regional Sales - HTML
          2. 1.3.1.2. Steel Wheels: Income Statement
          3. 1.3.1.3. Steel Wheels: Top 10 Customers
          4. 1.3.1.4. BI Developer Examples: button-single-parameter.prpt
        2. 1.3.2. Charting Examples
          1. 1.3.2.1. Steel Wheels: Chart Pick List
          2. 1.3.2.2. Steel Wheels: Flash Chart List
          3. 1.3.2.3. BI Developer Examples: Regional Sales - Line/Bar Chart
        3. 1.3.3. Analysis Examples
          1. 1.3.3.1. BI Developer Examples: Slice and Dice
          2. 1.3.3.2. Steel Wheels Analysis Examples
        4. 1.3.4. Dashboarding Examples
        5. 1.3.5. Other Examples
      4. 1.4. Summary
    2. 2. Prerequisites
      1. 2.1. Basic System Setup
        1. 2.1.1. Installing Ubuntu
          1. 2.1.1.1. Using Ubuntu in Native Mode
          2. 2.1.1.2. Using a Virtual Machine
        2. 2.1.2. Working with the Terminal
          1. 2.1.2.1. Directory Navigation
          2. 2.1.2.2. Command History
      2. 2.2. Using Symbolic Links
        1. 2.2.1. Creating Symbolic Links in Ubuntu
        2. 2.2.2. Creating Symlinks in Windows Vista
      3. 2.3. Java Installation and Configuration
        1. 2.3.1. Installing Java on Ubuntu Linux
        2. 2.3.2. Installing Java on Windows
      4. 2.4. MySQL Installation
        1. 2.4.1. Installing MySQL Server and Client on Ubuntu
        2. 2.4.2. Installing MySQL Server and Client on Windows
        3. 2.4.3. MySQL GUI Tools
          1. 2.4.3.1. Ubuntu Install
          2. 2.4.3.2. Windows Install
      5. 2.5. Database Tools
        1. 2.5.1. Power*Architect and Other Design Tools
        2. 2.5.2. Squirrel SQL Client
          1. 2.5.2.1. Ubuntu Install
          2. 2.5.2.2. Windows Install
        3. 2.5.3. SQLeonardo
      6. 2.6. Summary
    3. 3. Server Installation and Configuration
      1. 3.1. Server Configuration
        1. 3.1.1. Installation
          1. 3.1.1.1. Installation Directory
          2. 3.1.1.2. User Account
          3. 3.1.1.3. Configuring Tomcat
          4. 3.1.1.4. Automatic Startup
            1. 3.1.1.4.1. Automatic Startup in UNIX/Linux Systems
            2. 3.1.1.4.2. Automatic Startup in Windows Systems
        2. 3.1.2. Managing Database Drivers
          1. 3.1.2.1. Driver Location for the Server
          2. 3.1.2.2. Driver Location for the Administration Console
          3. 3.1.2.3. Managing JDBC Drivers on UNIX-Based Systems
        3. 3.1.3. System Databases
          1. 3.1.3.1. Setting Up the MySQL Schemas
          2. 3.1.3.2. Configuring Quartz and Hibernate
            1. 3.1.3.2.1. Quartz
            2. 3.1.3.2.2. Hibernate
          3. 3.1.3.3. Configuring JDBC Security
          4. 3.1.3.4. Sample Data
          5. 3.1.3.5. Modify the Pentaho Startup Scripts
        4. 3.1.4. E-mail
          1. 3.1.4.1. Basic SMTP Configuration
          2. 3.1.4.2. Secure SMTP Configuration
          3. 3.1.4.3. Testing E-mail Configuration
        5. 3.1.5. Publisher Password
      2. 3.2. Administrative Tasks
        1. 3.2.1. The Pentaho Administration Console
          1. 3.2.1.1. Basic PAC Configuration
          2. 3.2.1.2. Starting and Stopping PAC
          3. 3.2.1.3. The PAC Front End
          4. 3.2.1.4. Configuring PAC Security and Credentials
        2. 3.2.2. User Management
        3. 3.2.3. Data Sources
        4. 3.2.4. Other Administrative Tasks
      3. 3.3. Summary
    4. 4. The Pentaho BI Stack
      1. 4.1. Pentaho BI Stack Perspectives
        1. 4.1.1. Functionality
        2. 4.1.2. Server, Web Client, and Desktop Programs
        3. 4.1.3. Front-Ends and Back-Ends
        4. 4.1.4. Underlying Technology
      2. 4.2. The Pentaho Business Intelligence Server
        1. 4.2.1. The Platform
          1. 4.2.1.1. The Solution Repository and the Solution Engine
          2. 4.2.1.2. Database Connection Pool Management
          3. 4.2.1.3. User Authentication and Authorization
          4. 4.2.1.4. Task Scheduling
          5. 4.2.1.5. E-mail Services
        2. 4.2.2. BI Components
          1. 4.2.2.1. The Metadata Layer
          2. 4.2.2.2. Ad hoc Reporting Service
          3. 4.2.2.3. The ETL Engine
          4. 4.2.2.4. Reporting Engines
          5. 4.2.2.5. The OLAP Engine
          6. 4.2.2.6. The Data Mining Engine
        3. 4.2.3. The Presentation Layer
        4. 4.2.4. Underlying Java Servlet Technology
      3. 4.3. Desktop Programs
      4. 4.4. Pentaho Enterprise Edition and Community Edition
      5. 4.5. Creating Action Sequences with Pentaho Design Studio
        1. 4.5.1. Pentaho Design Studio (Eclipse) Primer
        2. 4.5.2. The Action Sequence Editor
        3. 4.5.3. Anatomy of an Action Sequence
          1. 4.5.3.1. Inputs
          2. 4.5.3.2. Outputs
          3. 4.5.3.3. Actions
      6. 4.6. Summary
  7. II. Dimensional Modeling and Data Warehouse Design
    1. 5. Example Business Case: World Class Movies
      1. 5.1. World Class Movies: The Basics
      2. 5.2. The WCM Data
        1. 5.2.1. Obtaining and Generating Data
        2. 5.2.2. WCM Database: The Big Picture
        3. 5.2.3. DVD Catalog
        4. 5.2.4. Customers
        5. 5.2.5. Employees
        6. 5.2.6. Purchase Orders
        7. 5.2.7. Customer Orders and Promotions
        8. 5.2.8. Inventory Management
      3. 5.3. Managing the Business: The Purpose of Business Intelligence
        1. 5.3.1. Typical Business Intelligence Questions for WCM
        2. 5.3.2. Data Is Key
      4. 5.4. Summary
    2. 6. Data Warehouse Primer
      1. 6.1. Why Do You Need a Data Warehouse?
      2. 6.2. The Big Debate: Inmon Versus Kimball
      3. 6.3. Data Warehouse Architecture
        1. 6.3.1. The Staging Area
        2. 6.3.2. The Central Data Warehouse
        3. 6.3.3. Data Marts
          1. 6.3.3.1. OLAP Cubes
          2. 6.3.3.2. Storage Formats and MDX
      4. 6.4. Data Warehouse Challenges
        1. 6.4.1. Data Quality
          1. 6.4.1.1. Data Vault and Data Quality
          2. 6.4.1.2. Using Reference and Master Data
        2. 6.4.2. Data Volume and Performance
        3. 6.4.3. Changed Data Capture
          1. 6.4.3.1. Source Data-Based CDC
          2. 6.4.3.2. Trigger-Based CDC
          3. 6.4.3.3. Snapshot-Based CDC
          4. 6.4.3.4. Log-Based CDC
          5. 6.4.3.5. Which CDC Alternative Should You Choose?
        4. 6.4.4. Changing User Requirements
      5. 6.5. Data Warehouse Trends
        1. 6.5.1. Virtual Data Warehousing
        2. 6.5.2. Real-Time Data Warehousing
        3. 6.5.3. Analytical Databases
        4. 6.5.4. Data Warehouse Appliances
        5. 6.5.5. On Demand Data Warehousing
      6. 6.6. Summary
    3. 7. Modeling the Business Using Star Schemas
      1. 7.1. What Is a Star Schema?
        1. 7.1.1. Dimension Tables and Fact Tables
          1. 7.1.1.1. Fact Table Types
      2. 7.2. Querying Star Schemas
        1. 7.2.1. Join Types
        2. 7.2.2. Applying Restrictions in a Query
          1. 7.2.2.1. Combining Multiple Restrictions
          2. 7.2.2.2. Restricting Aggregate Results
          3. 7.2.2.3. Ordering Data
      3. 7.3. The Bus Architecture
      4. 7.4. Design Principles
        1. 7.4.1. Using Surrogate Keys
        2. 7.4.2. Naming and Type Conventions
        3. 7.4.3. Granularity and Aggregation
        4. 7.4.4. Audit Columns
        5. 7.4.5. Modeling Date and Time
          1. 7.4.5.1. Time Dimension Granularity
          2. 7.4.5.2. Local Versus UTC Time
          3. 7.4.5.3. Smart Date Keys
          4. 7.4.5.4. Handling Relative Time
        6. 7.4.6. Unknown Dimension Keys
      5. 7.5. Handling Dimension Changes
        1. 7.5.1. SCD Type 1: Overwrite
        2. 7.5.2. SCD Type 2: Add Row
        3. 7.5.3. SCD Type 3: Add Column
        4. 7.5.4. SCD Type 4: Mini-Dimensions
        5. 7.5.5. SCD Type 5: Separate History Table
        6. 7.5.6. SCD Type 6: Hybrid Strategies
      6. 7.6. Advanced Dimensional Model Concepts
        1. 7.6.1. Monster Dimensions
        2. 7.6.2. Junk, Heterogeneous, and Degenerate Dimensions
        3. 7.6.3. Role-Playing Dimensions
        4. 7.6.4. Multi-Valued Dimensions and Bridge Tables
        5. 7.6.5. Building Hierarchies
        6. 7.6.6. Snowflakes and Clustering Dimensions
        7. 7.6.7. Outriggers
        8. 7.6.8. Consolidating Multi-Grain Tables
      7. 7.7. Summary
    4. 8. The Data Mart Design Process
      1. 8.1. Requirements Analysis
        1. 8.1.1. Getting the Right Users Involved
        2. 8.1.2. Collecting Requirements
      2. 8.2. Data Analysis
        1. 8.2.1. Data Profiling
        2. 8.2.2. Using eobjects.org DataCleaner
          1. 8.2.2.1. Adding Profile Tasks
          2. 8.2.2.2. Adding Database Connections
          3. 8.2.2.3. Doing an Initial Profile
          4. 8.2.2.4. Working with Regular Expressions
          5. 8.2.2.5. Profiling and Exploring Results
          6. 8.2.2.6. Validating and Comparing Data
          7. 8.2.2.7. Using a Dictionary for Column Dependency Checks
          8. 8.2.2.8. Alternative Solutions
      3. 8.3. Developing the Model
      4. 8.4. Data Modeling with Power*Architect
      5. 8.5. Building the WCM Data Marts
        1. 8.5.1. Generating the Database
          1. 8.5.1.1. Generating Static Dimensions
          2. 8.5.1.2. Special Date Fields and Calculations
        2. 8.5.2. Source to Target Mapping
      6. 8.6. Summary
  8. III. ETL and Data Integration
    1. 9. Pentaho Data Integration Primer
      1. 9.1. Data Integration Overview
        1. 9.1.1. Data Integration Activities
          1. 9.1.1.1. Extraction
          2. 9.1.1.2. Change Data Capture
          3. 9.1.1.3. Data Staging
          4. 9.1.1.4. Data Validation
          5. 9.1.1.5. Data Cleansing
          6. 9.1.1.6. Decoding and Renaming
          7. 9.1.1.7. Key Management
          8. 9.1.1.8. Aggregation
          9. 9.1.1.9. Dimension and Bridge Table Maintenance
          10. 9.1.1.10. Loading Fact Tables
        2. 9.1.2. Pentaho Data Integration Concepts and Components
          1. 9.1.2.1. Tools and Utilities
          2. 9.1.2.2. The Data Integration Engine
          3. 9.1.2.3. Repository
          4. 9.1.2.4. Jobs and Transformations
            1. 9.1.2.4.1. Transformations
            2. 9.1.2.4.2. Jobs
          5. 9.1.2.5. Plug-in Architecture
      2. 9.2. Getting Started with Spoon
        1. 9.2.1. Launching the Spoon Application
        2. 9.2.2. A Simple "Hello, World!" Example
          1. 9.2.2.1. Building the Transformation
          2. 9.2.2.2. Running the Transformation
          3. 9.2.2.3. The Execution Results Pane
          4. 9.2.2.4. The Output
        3. 9.2.3. Checking Consistency and Dependencies
          1. 9.2.3.1. Logical Consistency
          2. 9.2.3.2. Resource Dependencies
          3. 9.2.3.3. Verifying the Transformation
        4. 9.2.4. Working with Database Connections
          1. 9.2.4.1. JDBC and ODBC Connectivity
          2. 9.2.4.2. Creating a Database Connection
          3. 9.2.4.3. Testing Database Connections
          4. 9.2.4.4. How Database Connections Are Used
          5. 9.2.4.5. A Database-Enabled "Hello, World!" Example
          6. 9.2.4.6. Database Connection Configuration Management
          7. 9.2.4.7. Generic Database Connections
      3. 9.3. Summary
    2. 10. Designing Pentaho Data Integration Solutions
      1. 10.1. Generating Dimension Table Data
        1. 10.1.1. Using Stored Procedures
        2. 10.1.2. Loading a Simple Date Dimension
          1. 10.1.2.1. CREATE TABLE dim_date: Using the Execute SQL Script Step
          2. 10.1.2.2. Missing Date and Generate Rows with Initial Date: The Generate Rows Step
          3. 10.1.2.3. Days Sequence: The Add Sequence Step
          4. 10.1.2.4. Calculate and Format Dates: The Calculator Step
          5. 10.1.2.5. The Value Mapper Step
          6. 10.1.2.6. Load dim_date: The Table Output Step
        3. 10.1.3. More Advanced Date Dimension Features
          1. 10.1.3.1. ISO Week and Year
          2. 10.1.3.2. Current and Last Year Indicators
          3. 10.1.3.3. Internationalization and Locale Support
        4. 10.1.4. Loading a Simple Time Dimension
          1. 10.1.4.1. Combine: The Join Rows (Cartesian product) Step
          2. 10.1.4.2. Calculate Time: Again, the Calculator Step
        5. 10.1.5. Loading the Demography Dimension
          1. 10.1.5.1. Understanding the stage_demography and dim_demography Tables
          2. 10.1.5.2. Generating Age and Income Groups
          3. 10.1.5.3. Multiple Incoming and Outgoing Streams
      2. 10.2. Loading Data from Source Systems
        1. 10.2.1. Staging Lookup Values
          1. 10.2.1.1. The stage_lookup_data Job
          2. 10.2.1.2. The START Job Entry
          3. 10.2.1.3. Transformation Job Entries
          4. 10.2.1.4. Mail Success and Mail Failure
          5. 10.2.1.5. The extract_lookup_type and extract_lookup_value Transformations
          6. 10.2.1.6. The stage_lookup_data Transformation
          7. 10.2.1.7. Check If Staging Table Exists: The Table Exists Step
          8. 10.2.1.8. The Filter rows Step
          9. 10.2.1.9. Create Staging Table: Executing Dynamic SQL
          10. 10.2.1.10. The Dummy Step
          11. 10.2.1.11. The Stream Lookup Step
          12. 10.2.1.12. Sort on Lookup Type: The Sort Rows Step
          13. 10.2.1.13. Store to Staging Table: Using a Table Output Step to Load Multiple Tables
        2. 10.2.2. The Promotion Dimension
          1. 10.2.2.1. Promotion Mappings
          2. 10.2.2.2. Promotion Data Changes
          3. 10.2.2.3. Synchronization Frequency
          4. 10.2.2.4. The load_dim_promotion Job
          5. 10.2.2.5. The extract_promotion Transformation
          6. 10.2.2.6. Determining Promotion Data Changes
          7. 10.2.2.7. Saving the Extract and Passing on the File Name
          8. 10.2.2.8. Picking Up the File and Loading the Extract
      3. 10.3. Summary
    3. 11. Deploying Pentaho Data Integration Solutions
      1. 11.1. Configuration Management
        1. 11.1.1. Using Variables
          1. 11.1.1.1. Variables in Configuration Properties
          2. 11.1.1.2. User-Defined Variables
          3. 11.1.1.3. Built-in Variables
          4. 11.1.1.4. Variables Example: Dynamic Database Connections
          5. 11.1.1.5. More About the Set Variables Step
          6. 11.1.1.6. Set Variables Step Gotchas
        2. 11.1.2. Using JNDI Connections
          1. 11.1.2.1. What Is JNDI?
          2. 11.1.2.2. Creating a JNDI Connection
          3. 11.1.2.3. JNDI Connections and Deployment
        3. 11.1.3. Working with the PDI Repository
          1. 11.1.3.1. Creating a PDI Repository
          2. 11.1.3.2. Connecting to the Repository
          3. 11.1.3.3. Automatically Connecting to a Default Repository
          4. 11.1.3.4. The Repository Explorer
          5. 11.1.3.5. Managing Repository User Accounts
          6. 11.1.3.6. How PDI Keeps Track of Repositories
          7. 11.1.3.7. Upgrading an Existing Repository
      2. 11.2. Running in the Deployment Environment
        1. 11.2.1. Running from the Command Line
          1. 11.2.1.1. Command-Line Parameters
          2. 11.2.1.2. Running Jobs with Kitchen
          3. 11.2.1.3. Running Transformations with Pan
          4. 11.2.1.4. Using Custom Command-line Parameters
          5. 11.2.1.5. Using Obfuscated Database Passwords
        2. 11.2.2. Running Inside the Pentaho BI Server
          1. 11.2.2.1. Transformations in Action Sequences
          2. 11.2.2.2. Jobs in Action Sequences
          3. 11.2.2.3. The Pentaho BI Server and the PDI Repository
        3. 11.2.3. Remote Execution with Carte
          1. 11.2.3.1. Why Remote Execution?
            1. 11.2.3.1.1. Scalability
            2. 11.2.3.1.2. Availability
            3. 11.2.3.1.3. Reduction of Network Traffic
            4. 11.2.3.1.4. Reduction of Latency
          2. 11.2.3.2. Running Carte
          3. 11.2.3.3. Creating Slave Servers
          4. 11.2.3.4. Remotely Executing a Transformation or Job
          5. 11.2.3.5. Clustering
      3. 11.3. Summary
  9. IV. Business Intelligence Applications
    1. 12. The Metadata Layer
      1. 12.1. Metadata Overview
        1. 12.1.1. What Is Metadata?
        2. 12.1.2. The Advantages of the Metadata Layer
          1. 12.1.2.1. Using Metadata to Make a More User-Friendly Interface
          2. 12.1.2.2. Adding Flexibility and Schema Independence
          3. 12.1.2.3. Refining Access Privileges
          4. 12.1.2.4. Handling Localization
          5. 12.1.2.5. Enforcing Consistent Formatting and Behavior
        3. 12.1.3. Scope and Usage of the Metadata Layer
      2. 12.2. Pentaho Metadata Features
        1. 12.2.1. Database and Query Abstraction
          1. 12.2.1.1. Report Definition: A Business User's Point of View
          2. 12.2.1.2. Report Implementation: A SQL Developer's Point of View
          3. 12.2.1.3. Mechanics of Abstraction: The Metadata Layer
        2. 12.2.2. Properties, Concepts, and Inheritance in the Metadata Layer
          1. 12.2.2.1. Properties
          2. 12.2.2.2. Concepts
          3. 12.2.2.3. Inheritance
          4. 12.2.2.4. Localization of Properties
      3. 12.3. Creation and Maintenance of Metadata
        1. 12.3.1. The Pentaho Metadata Editor
        2. 12.3.2. The Metadata Repository
        3. 12.3.3. Metadata Domains
        4. 12.3.4. The Sublayers of the Metadata Layer
          1. 12.3.4.1. The Physical Layer
            1. 12.3.4.1.1. Connections
            2. 12.3.4.1.2. Physical Tables and Physical Columns
          2. 12.3.4.2. The Logical Layer
            1. 12.3.4.2.1. Business Models
            2. 12.3.4.2.2. Business Tables and Business Columns
            3. 12.3.4.2.3. Relationships
          3. 12.3.4.3. The Delivery Layer
            1. 12.3.4.3.1. Business Views
            2. 12.3.4.3.2. Business Categories
        5. 12.3.5. Deploying and Using Metadata
          1. 12.3.5.1. Exporting and Importing XMI files
          2. 12.3.5.2. Publishing the Metadata to the Server
          3. 12.3.5.3. Refreshing the Metadata
      4. 12.4. Summary
    2. 13. Using the Pentaho Reporting Tools
      1. 13.1. Reporting Architecture
      2. 13.2. Web-Based Reporting
      3. 13.3. Practical Uses of WAQR
      4. 13.4. Pentaho Report Designer
        1. 13.4.1. The PRD Screen
        2. 13.4.2. Report Structure
        3. 13.4.3. Report Elements
        4. 13.4.4. Creating Data Sets
          1. 13.4.4.1. Creating SQL Queries Using JDBC
          2. 13.4.4.2. Creating Metadata Queries
          3. 13.4.4.3. Example Data Set
        5. 13.4.5. Adding and Using Parameters
        6. 13.4.6. Layout and Formatting
        7. 13.4.7. Alternate Row Colors: Row Banding
        8. 13.4.8. Grouping and Summarizing Data
          1. 13.4.8.1. Adding and Modifying Groups
          2. 13.4.8.2. Using Functions
          3. 13.4.8.3. Using Formulas
        9. 13.4.9. Adding Charts and Graphs
          1. 13.4.9.1. Adding a Bar Chart
          2. 13.4.9.2. Pie Charts
          3. 13.4.9.3. Working with Images
        10. 13.4.10. Working with Subreports
          1. 13.4.10.1. Passing Parameter Values to Subreports
        11. 13.4.11. Publishing and Exporting Reports
          1. 13.4.11.1. Refreshing the Metadata
          2. 13.4.11.2. Exporting Reports
      5. 13.5. Summary
    3. 14. Scheduling, Subscription, and Bursting
      1. 14.1. Scheduling
        1. 14.1.1. Scheduler Concepts
          1. 14.1.1.1. Public and Private Schedules
          2. 14.1.1.2. Content Repository
        2. 14.1.2. Creating and Maintaining Schedules with the Pentaho Administration Console
          1. 14.1.2.1. Creating a New Schedule
          2. 14.1.2.2. Running Schedules
          3. 14.1.2.3. Suspending and Resuming Schedules
          4. 14.1.2.4. Deleting Schedules
        3. 14.1.3. Programming the Scheduler with Action Sequences
          1. 14.1.3.1. Add Job
          2. 14.1.3.2. Suspend Job, Resume Job, and Delete Job
          3. 14.1.3.3. Other Scheduler Process Actions
        4. 14.1.4. Scheduler Alternatives
          1. 14.1.4.1. UNIX-Based Systems: Cron
          2. 14.1.4.2. Windows: The at Utility and the Task Scheduler
      2. 14.2. Background Execution and Subscription
        1. 14.2.1. How Background Execution Works
        2. 14.2.2. How Subscription Works
          1. 14.2.2.1. Allowing Users to Subscribe
          2. 14.2.2.2. Granting Execute and Schedule Privileges
          3. 14.2.2.3. The Actual Subscription
        3. 14.2.3. The User's Workspace
          1. 14.2.3.1. Viewing the Contents of the Workspace
          2. 14.2.3.2. The Waiting, Complete, and My Schedules Panes
          3. 14.2.3.3. The Public Schedules Pane
          4. 14.2.3.4. The Server Administrator's Workspace
          5. 14.2.3.5. Cleaning Out the Workspace
      3. 14.3. Bursting
        1. 14.3.1. Implementation of Bursting in Pentaho
        2. 14.3.2. Bursting Example: Rental Reminder E-mails
          1. 14.3.2.1. Step 1: Finding Customers with DVDs That Are Due This Week
          2. 14.3.2.2. Step 2: Looping Through the Customers
          3. 14.3.2.3. Step 3: Getting DVDs That Are Due to Be Returned
          4. 14.3.2.4. Step 4: Running the Reminder Report
          5. 14.3.2.5. Step 5: Sending the Report via E-mail
        3. 14.3.3. Other Bursting Implementations
      4. 14.4. Summary
    4. 15. OLAP Solutions Using Pentaho Analysis Services
      1. 15.1. Overview of Pentaho Analysis Services
        1. 15.1.1. Architecture
        2. 15.1.2. Schema
        3. 15.1.3. Schema Design Tools
        4. 15.1.4. Aggregate Tables
      2. 15.2. MDX Primer
        1. 15.2.1. Cubes, Dimensions, and Measures
          1. 15.2.1.1. The Cube Concept
          2. 15.2.1.2. Star Schema Analogy
          3. 15.2.1.3. Cube Visualization
        2. 15.2.2. Hierarchies, Levels, and Members
          1. 15.2.2.1. Hierarchies
          2. 15.2.2.2. Levels and Members
          3. 15.2.2.3. The All Level, All Member, and Default Member
          4. 15.2.2.4. Member Sets
          5. 15.2.2.5. Multiple Hierarchies
        3. 15.2.3. Cube Family Relationships
          1. 15.2.3.1. Relative Time Relationships
        4. 15.2.4. MDX Query Syntax
          1. 15.2.4.1. Basic MDX Query
          2. 15.2.4.2. Axes: ON ROWS and ON COLUMNS
          3. 15.2.4.3. Looking at a Part of the Data
          4. 15.2.4.4. Dimension on Only One Axis
          5. 15.2.4.5. More MDX Examples: a Simple Cube
          6. 15.2.4.6. The FILTER Function
          7. 15.2.4.7. The ORDER Function
          8. 15.2.4.8. Using TOPCOUNT and BOTTOMCOUNT
          9. 15.2.4.9. Combining Dimensions: The CROSSJOIN Function
          10. 15.2.4.10. Using NON EMPTY
          11. 15.2.4.11. Working with Sets and the WITH Clause
          12. 15.2.4.12. Using Calculated Members
      3. 15.3. Creating Mondrian Schemas
        1. 15.3.1. Getting Started with Pentaho Schema Workbench
          1. 15.3.1.1. Downloading Mondrian
          2. 15.3.1.2. Installing Pentaho Schema Workbench
          3. 15.3.1.3. Starting Pentaho Schema Workbench
          4. 15.3.1.4. Establishing a Connection
          5. 15.3.1.5. JDBC Explorer
        2. 15.3.2. Using the Schema Editor
          1. 15.3.2.1. Creating a New Schema
          2. 15.3.2.2. Saving the Schema on Disk
          3. 15.3.2.3. Editing Object Attributes
          4. 15.3.2.4. Changing Edit Mode
        3. 15.3.3. Creating and Editing a Basic Schema
          1. 15.3.3.1. Basic Schema Editing Tasks
          2. 15.3.3.2. Creating a Cube
          3. 15.3.3.3. Choosing a Fact Table
          4. 15.3.3.4. Adding Measures
          5. 15.3.3.5. Adding Dimensions
          6. 15.3.3.6. Adding and Editing Hierarchies and Choosing Dimension Tables
          7. 15.3.3.7. Adding Hierarchy Levels
          8. 15.3.3.8. Associating Cubes with Shared Dimensions
          9. 15.3.3.9. Adding the DVD and Customer Dimensions
          10. 15.3.3.10. XML Listing
        4. 15.3.4. Testing and Deployment
          1. 15.3.4.1. Using the MDX Query Tool
          2. 15.3.4.2. Publishing the Cube
        5. 15.3.5. Schema Design Topics We Didn't Cover
      4. 15.4. Visualizing Mondrian Cubes with JPivot
        1. 15.4.1. Getting Started with the Analysis View
          1. 15.4.1.1. Using the JPivot Toolbar
        2. 15.4.2. Drilling
          1. 15.4.2.1. Drilling Flavors
          2. 15.4.2.2. Drill Member and Drill Position
          3. 15.4.2.3. Drill Replace
          4. 15.4.2.4. Drill Through
        3. 15.4.3. The OLAP Navigator
          1. 15.4.3.1. Controlling Placement of Dimensions on Axes
          2. 15.4.3.2. Slicing with the OLAP Navigator
          3. 15.4.3.3. Specifying Member Sets with the OLAP Navigator
          4. 15.4.3.4. Displaying Multiple Measures
        4. 15.4.4. Miscellaneous Features
          1. 15.4.4.1. MDX Query Pane
          2. 15.4.4.2. PDF and Excel Export
          3. 15.4.4.3. Chart
      5. 15.5. Enhancing Performance Using the Pentaho Aggregate Designer
        1. 15.5.1. Aggregation Benefits
        2. 15.5.2. Extending Mondrian with Aggregate Tables
        3. 15.5.3. Pentaho Aggregate Designer
        4. 15.5.4. Alternative Solutions
      6. 15.6. Summary
    5. 16. Data Mining with Weka
      1. 16.1. Data Mining Primer
        1. 16.1.1. Data Mining Process
        2. 16.1.2. Data Mining Toolset
          1. 16.1.2.1. Classification
          2. 16.1.2.2. Clustering
          3. 16.1.2.3. Association
          4. 16.1.2.4. Numeric Prediction (Regression)
          5. 16.1.2.5. Data Mining Algorithms
        3. 16.1.3. Training and Testing
        4. 16.1.4. Stratified Cross-Validation
      2. 16.2. The Weka Workbench
        1. 16.2.1. Weka Input Formats
        2. 16.2.2. Setting up Weka Database Connections
        3. 16.2.3. Starting Weka
        4. 16.2.4. The Weka Explorer
        5. 16.2.5. The Weka Experimenter
        6. 16.2.6. Weka KnowledgeFlow
      3. 16.3. Using Weka with Pentaho
        1. 16.3.1. Adding PDI Weka Plugins
        2. 16.3.2. Getting Started with Weka and PDI
          1. 16.3.2.1. Data Acquisition and Preparation
          2. 16.3.2.2. Creating and Saving the Model
          3. 16.3.2.3. Using the Weka Scoring Plugin
      4. 16.4. Further Reading
      5. 16.5. Summary
    6. 17. Building Dashboards
      1. 17.1. The Community Dashboard Framework
        1. 17.1.1. CDF, the Community, and the Pentaho Corporation
          1. 17.1.1.1. CDF Project History and Who's Who
          2. 17.1.1.2. Issue Management, Documentation, and Support
        2. 17.1.2. Skills and Technologies for CDF Dashboards
      2. 17.2. CDF Concepts and Architecture
        1. 17.2.1. The CDF Plugin
          1. 17.2.1.1. The CDF Home Directory
          2. 17.2.1.2. The plugin.xml File
          3. 17.2.1.3. CDF JavaScript and CSS Resources
        2. 17.2.2. The .xcdf File
        3. 17.2.3. Templates
          1. 17.2.3.1. Document Template (a.k.a. Outer Template)
          2. 17.2.3.2. Content Template
      3. 17.3. Example: Customers and Websites Dashboard
        1. 17.3.1. Setup
          1. 17.3.1.1. Creating the .xcdf File
          2. 17.3.1.2. Creating the Dashboard HTML File
          3. 17.3.1.3. Boilerplate Code: Getting the Solution and Path
          4. 17.3.1.4. Boilerplate Code: Dashboard Parameters
          5. 17.3.1.5. Boilerplate Code: Dashboard Components
          6. 17.3.1.6. Testing
        2. 17.3.2. Customers per Website Pie Chart
          1. 17.3.2.1. Customers per Website: Pie Chart Action Sequence
          2. 17.3.2.2. Customers per Website: XactionComponent
        3. 17.3.3. Dynamically Changing the Dashboard Title
          1. 17.3.3.1. Adding the website_name Dashboard Parameter
          2. 17.3.3.2. Reacting to Mouse Clicks on the Pie Chart
          3. 17.3.3.3. Adding a TextComponent
        4. 17.3.4. Showing Customer Locations
          1. 17.3.4.1. CDF MapComponent Data Format
          2. 17.3.4.2. Adding a Geography Dimension
          3. 17.3.4.3. Location Data Action Sequence
          4. 17.3.4.4. Putting It on the Map
          5. 17.3.4.5. Using Different Markers Depending on Data
        5. 17.3.5. Styling and Customization
          1. 17.3.5.1. Styling the Dashboard
          2. 17.3.5.2. Creating a Custom Document Template
      4. 17.4. Summary