You are previewing Discovering Partial Least Squares with JMP.
O'Reilly logo
Discovering Partial Least Squares with JMP

Book Description

Partial Least Squares (PLS) is a flexible statistical modeling technique that applies to data of any shape. It models relationships between inputs and outputs even when there are more predictors than observations. Using JMP statistical discovery software from SAS, Discovering Partial Least Squares with JMP explores PLS and positions it within the more general context of multivariate analysis. Ian Cox and Marie Gaudard use a “learning through doing” style. This approach, coupled with the interactivity that JMP itself provides, allows you to actively engage with the content. Four complete case studies are presented, accompanied by data tables that are available for download. The detailed “how to” steps, together with the interpretation of the results, help to make this book unique. Discovering Partial Least Squares with JMP is of interest to professionals engaged in continuing development, as well as to students and instructors in a formal academic setting. The content aligns well with topics covered in introductory courses on: psychometrics, customer relationship management, market research, consumer research, environmental studies, and chemometrics. The book can also function as a supplement to courses in multivariate statistics and to courses on statistical methods in biology, ecology, chemistry, and genomics. While the book is helpful and instructive to those who are using JMP, a knowledge of JMP is not required, and little or no prior statistical knowledge is necessary. By working through the introductory chapters and the case studies, you gain a deeper understanding of PLS and learn how to use JMP to perform PLS analyses in real-world situations. This book motivates current and potential users of JMP to extend their analytical repertoire by embracing PLS. Dynamically interacting with JMP, you will develop confidence as you explore underlying concepts and work through the examples. The authors provide background and guidance to support and empower you on this journey. This book is part of the SAS Press program.

Table of Contents

    1. A Word to the Practitioner
    2. The Organization of the Book
    3. Required Software
    4. Accessing the Supplementary Content
    1. Modeling in General
    2. Partial Least Squares in Today’s World
    3. Transforming, and Centering and Scaling Data
    4. An Example of a PLS Analysis
      1. The Data and the Goal
      2. The Analysis
      3. Testing the Model
    1. The Cars Example
    2. Estimating the Coefficients
    3. Underfitting and Overfitting: A Simulation
    4. The Effect of Correlation among Predictors: A Simulation
    1. Principal Components Analysis
    2. Centering and Scaling: An Example
    3. The Importance of Exploratory Data Analysis in Multivariate Studies
    4. Dimensionality Reduction via PCA
    1. Centering and Scaling in PLS
    2. PLS as a Multivariate Technique
    3. Why Use PLS?
    4. How Does PLS Work?
    5. PLS versus PCA
    6. PLS Scores and Loadings
      1. Some Technical Background
    7. An Example Exploring Prediction
      1. One-Factor NIPALS Model
      2. Two-Factor NIPALS Model
      3. Variable Selection
      4. SIMPLS Fits
    8. Choosing the Number of Factors
      1. Cross Validation
      2. Types of Cross Validation
      3. A Simulation of K-Fold Cross Validation
      4. Validation in the PLS Platform
    9. The NIPALS and SIMPLS Algorithms
    10. Useful Things to Remember About PLS
    1. Background
    2. The Data
      1. Data Table Description
      2. Initial Data Visualization
    3. A First PLS Model
      1. Our Plan
      2. Performing the Analysis
      3. The Partial Least Squares Report
      4. The SIMPLS Fit Report
      5. Other Options
    4. A Pruned PLS Model
      1. Model Fit
      2. Diagnostics
    5. Performance on Data from Second Study
      1. Comparing Predicted Values for the Second Study to Actual Values
      2. Comparing Residuals for Both Studies
      3. Obtaining Additional Insight
    6. Conclusion
    1. Background
    2. The Data
      1. Data Table Description
      2. Creating a Test Set Indicator Column
    3. Viewing the Data
      1. Octane and the Test Set
      2. Creating a Stacked Data Table
      3. Constructing Plots of the Individual Spectra
      4. Individual Spectra
      5. Combined Spectra
    4. A First PLS Model
      1. Excluding the Test Set
      2. Fitting the Model
      3. The Initial Report
    5. A Second PLS Model
      1. Fitting the Model
      2. High-Level Overview
      3. Diagnostics
      4. Score Scatterplot Matrices
      5. Loading Plots
      6. VIPs
      7. Model Assessment Using Test Set
    6. A Pruned Model
    1. Background
    2. The Data
      1. Data Table Description
      2. Initial Data Visualization
      3. Missing Response Values
      4. Impute Missing Data
      5. Distributions
      6. Transforming AGPT
      7. Differences by Ecoregion
      8. Conclusions from Visual Analysis and Implications
    3. A First PLS Model for the Savannah River Basin
      1. Our Plan
      2. Performing the Analysis
      3. The Partial Least Squares Report
      4. The NIPALS Fit Report
      5. Defining a Pruned Model
    4. A Pruned PLS Model for the Savannah River Basin
      1. Model Fit
      2. Diagnostics
      3. Saving the Prediction Formulas
      4. Comparing Actual Values to Predicted Values for the Test Set
    5. A First PLS Model for the Blue Ridge Ecoregion
      1. Making the Subset
      2. Reviewing the Data
      3. Performing the Analysis
      4. The NIPALS Fit Report
    6. A Pruned PLS Model for the Blue Ridge Ecoregion
      1. Model Fit
      2. Comparing Actual Values to Predicted Values for the Test Set
    7. Conclusion
    1. Background
    2. The Data
      1. Data Table Description
      2. Missing Data Check
    3. The First Stage Model
      1. Visual Exploration of Overall Liking and Consumer Xs
      2. The Plan for the First Stage Model
      3. Stage One PLS Model
      4. Stage One Pruned PLS Model
      5. Stage One MLR Model
      6. Comparing the Stage One Models
      7. Visual Exploration of Ys and Xs
      8. Stage Two PLS Model
      9. Stage Two MLR Model
    4. The Combined Model for Overall Liking
      1. Constructing the Prediction Formula
      2. Viewing the Profiler
    5. Conclusion
    1. Ground Rules
    2. The Singular Value Decomposition of a Matrix
      1. Definition
      2. Relationship to Spectral Decomposition
      3. Other Useful Facts
    3. Principal Components Regression
    4. The Idea behind PLS Algorithms
    5. NIPALS
      1. The NIPALS Algorithm
      2. Computational Results
      3. Properties of the NIPALS Algorithm
    6. SIMPLS
      1. Optimization Criterion
      2. Implications for the Algorithm
      3. The SIMPLS Algorithm
    7. More on VIPs
    8. The Standardize X Option
    9. Determining the Number of Factors
      1. Cross Validation: How JMP Does It
    1. Introduction
    2. The Bias-Variance Tradeoff in PLS
      1. Introduction
      2. Two Simple Examples
      3. Motivation
      4. The Simulation Study
      5. Results and Discussion
      6. Conclusion
    3. Using PLS for Variable Selection
      1. Introduction
      2. Structure of the Study
      3. The Simulation
      4. Computation of Result Measures
      5. Results
      6. Conclusion