You are previewing Statistical Programming with SAS/IML Software.
O'Reilly logo
Statistical Programming with SAS/IML Software

Book Description

The first book to provide a comprehensive description of SAS/IML software and how to use it. Wicklin presents tips and techniques that enable you to use the IML procedure and the SAS/IML Studio application efficiently. It also shows how to create and modify statistical graphs, call SAS procedures and R functions from a SAS/IML program, and implement such modern statistical techniques as simulations and bootstrap methods in the SAS/IML language.

Table of Contents

  1. Copyright
  2. Acknowledgments
  3. I. Programming in the SAS/IML Language
    1. 1. An Introduction to SAS/IML Software
      1. 1.1 Overview of the SAS/IML Language
      2. 1.2 Comparing the SAS/IML Language and the DATA Step
      3. 1.3 Overview of SAS/IML Software
        1. 1.3.1 Overview of the IML Procedure
        2. 1.3.2 Running a PROC IML Program
        3. 1.3.3 Overview of SAS/IML Studio
        4. 1.3.4 Installing and Invoking SAS/IML Studio
        5. 1.3.5 Running a Program in SAS/IML Studio
        6. 1.3.6 Using SAS/IML Studio for Exploratory Data Analysis
      4. 1.4 Who Should Read This Book?
      5. 1.5 Overview of This Book
      6. 1.6 Possible Roadmaps through This Book
      7. 1.7 How to Read the Programs in This Book
      8. 1.8 Data and Programs Used in This Book
        1. 1.8.1 Installing the Example Data on a Local SAS Server
        2. 1.8.2 Installing the Example Data on a Remote SAS Server
    2. 2. Getting Started with the SAS/IML Matrix Programming Language
      1. 2.1 Overview of the SAS/IML Language
      2. 2.2 Creating Matrices
        1. 2.2.1 Printing a Matrix
        2. 2.2.2 The Dimensions of a Matrix
        3. 2.2.3 The Type of a Matrix
        4. 2.2.4 The Length of a Character Matrix
      3. 2.3 Using Functions to Create Matrices
        1. 2.3.1 Constant Matrices
        2. 2.3.2 Vectors of Sequential Values
        3. 2.3.3 Pseudorandom Matrices
      4. 2.4 Transposing a Matrix
      5. 2.5 Changing the Shape of Matrices
      6. 2.6 Extracting Data from Matrices
        1. 2.6.1 Extracting Rows and Columns
        2. 2.6.2 Matrix Diagonals
        3. 2.6.3 Printing a Submatrix or Expression
      7. 2.7 Comparision Operators
      8. 2.8 Control Statements
        1. 2.8.1 The IF-THEN/ELSE Statement
        2. 2.8.2 The Iterative DO Statement
      9. 2.9 Concatenation Operators
      10. 2.10 Logical Operators
      11. 2.11 Operations on Sets
      12. 2.12 Matrix Operators
        1. 2.12.1 Elementwise Operators
        2. 2.12.2 Matrix Computations
      13. 2.13 Managing the SAS/IML Workspace
    3. 3. Programming Techniques for Data Analysis
      1. 3.1 Overview of Programming Techniques
      2. 3.2 Reading and Writing Data
        1. 3.2.1 Creating Matrices from SAS Data Sets
        2. 3.2.2 Creating SAS Data Sets from Matrices
      3. 3.3 Frequently Used Techniques in Data Analysis
        1. 3.3.1 Applying a Variable Transformation
        2. 3.3.2 Locating Observations That Satisfy a Criterion
        3. 3.3.3 Assigning Values to Observations That Satisfy a Criterion
        4. 3.3.4 Handling Missing Values
        5. 3.3.5 Analyzing Observations by Categories
      4. 3.4 Defining SAS/IML Modules
        1. 3.4.1 Function and Subroutine Modules
        2. 3.4.2 Local Variables
        3. 3.4.3 Global Symbols
        4. 3.4.4 Passing Arguments by Reference
        5. 3.4.5 Evaluation of Arguments
        6. 3.4.6 Storing Modules
        7. 3.4.7 The IMLMLIB Library of Modules
      5. 3.5 Writing Efficient SAS/IML Programs
        1. 3.5.1 Avoid Loops to Improve Performance
        2. 3.5.2 Use Subscript Reduction Operators
        3. 3.5.3 Case Study: Standardizing the Columns of a Matrix
      6. 3.6 Case Study: Finding the Minimum of a Function
      7. 3.7 References
    4. 4. Calling SAS Procedures
      1. 4.1 Overview of Calling SAS Procedures
      2. 4.2 Calling a SAS Procedure from IMLPlus
      3. 4.3 Transferring Data between Matrices and Procedures
      4. 4.4 Passing Parameters to SAS Procedures
      5. 4.5 Case Study: Computing a Kernel Density Estimate
      6. 4.6 Creating Names for Output Variables
      7. 4.7 Creating Macro Variables from Matrices
      8. 4.8 Handling Errors When Calling a Procedure
      9. 4.9 Calling SAS Functions That Require Lists of Values
  4. II. Programming in SAS/IML Studio
    1. 5. IMLPlus: Programming in SAS/IML Studio
      1. 5.1 Overview of the IMLPlus Language
      2. 5.2 Calling SAS Procedures
        1. 5.2.1 Passing Parameters to a SAS Procedure
        2. 5.2.2 Checking the Return Code from a SAS Procedure
      3. 5.3 Calling R Functions
      4. 5.4 IMLPlus Graphs
      5. 5.5 Managing Data in Memory
      6. 5.6 Using Expressions When Reading or Writing Data
      7. 5.7 IMLPlus Modules
        1. 5.7.1 Storing and Loading IMLPlus Modules
          1. 5.7.1.1 Conventions for Saving Modules
          2. 5.7.1.2 Loading IMLPlus Modules
        2. 5.7.2 Local Variables in Modules
        3. 5.7.3 Creating an Alias for a Module
      8. 5.8 The IMLPlus Module Library
      9. 5.9 Features for Debugging Programs
        1. 5.9.1 Jumping to the Location of an Error
        2. 5.9.2 Jumping to Errors in Modules
        3. 5.9.3 Using the Auxiliary Input Window as a Debugging Aid
        4. 5.9.4 Using the PAUSE Statement as a Debugging Aid
      10. 5.10 Querying for User Input
      11. 5.11 Differences between IMLPlus and the IML Procedure
    2. 6. Understanding IMLPlus Classes
      1. 6.1 Overview of Understanding IMLPlus Classes
      2. 6.2 Object-Oriented Terminology
      3. 6.3 The DataObject Class
      4. 6.4 Base and Derived Classes
      5. 6.5 Creating a Graph
      6. 6.6 Creating Dynamically Linked Graphs
      7. 6.7 The Plot Class: A Base Class for Graphs
      8. 6.8 The Data Table Class
      9. 6.9 The DataView Class: A Base Class for Graphs and Data Tables
      10. 6.10 Passing Objects to IMLPlus Modules
      11. 6.11 Using a Base Class in a Module
    3. 7. Creating Statistical Graphs
      1. 7.1 Overview of Creating Statistical Graphs
      2. 7.2 The Source of Data for a Graph
      3. 7.3 Bar Charts
        1. 7.3.1 Creating a Bar Chart from a Vector
        2. 7.3.2 Creating a Bar Chart from a Data Object
        3. 7.3.3 Modifying the Appearance of a Graph
        4. 7.3.4 Frequently Used Bar Chart Methods
      4. 7.4 Histograms
        1. 7.4.1 Creating a Histogram from a Vector
        2. 7.4.2 Creating a Histogram from a Data Object
        3. 7.4.3 Frequently Used Histogram Methods
      5. 7.5 Scatter Plots
        1. 7.5.1 Creating a Scatter Plot from Vectors
        2. 7.5.2 Creating a Scatter Plot from a Data Object
      6. 7.6 Line Plots
        1. 7.6.1 Creating a Line Plot for a Single Variable
          1. 7.6.1.1 Creating a Line Plot from Vectors
          2. 7.6.1.2 Creating a Line Plot from a Data Object
        2. 7.6.2 Creating a Line Plot for Several Variables
          1. 7.6.2.1 Creating a Line Plot from Vectors
          2. 7.6.2.2 Creating a Line Plot from a Data Object
        3. 7.6.3 Creating a Line Plot with a Classification Variable
          1. 7.6.3.1 Creating a Line Plot from Vectors
          2. 7.6.3.2 Creating a Line Plot from a Data Object
        4. 7.6.4 Frequently Used Line Plot Methods
      7. 7.7 Box Plots
        1. 7.7.1 Creating a Box Plot
          1. 7.7.1.1 Creating a Box Plot from a Vector
          2. 7.7.1.2 Creating a Box Plot from a Data Object
        2. 7.7.2 Creating a Grouped Box Plot
          1. 7.7.2.1 Creating a Grouped Box Plot from Vectors
          2. 7.7.2.2 Creating a Grouped Box Plot from a Data Object
        3. 7.7.3 Frequently Used Box Plot Methods
      8. 7.8 Summary of Graph Types
      9. 7.9 Displaying the Data Used to Create a Graph
      10. 7.10 Changing the Format of a Graph Axis
      11. 7.11 Summary of Creating Graphs
      12. 7.12 References
    4. 8. Managing Data in IMLPlus
      1. 8.1 Overview of Managing Data in IMLPlus
      2. 8.2 Creating a Data Object
      3. 8.3 Creating a Data Object from a SAS Data Set
      4. 8.4 Creating Linked Graphs from a Data Object
      5. 8.5 Creating a Data Object from a Matrix
      6. 8.6 Creating a SAS Data Set from a Data Object
      7. 8.7 Creating a Matrix from a Data Object
      8. 8.8 Adding New Variables to a Data Object
        1. 8.8.1 Variable Transformations
        2. 8.8.2 Adding Variables for Predicted and Residual Values
        3. 8.8.3 A Module to Add Variables from a SAS Data Set
      9. 8.9 Review: The Purpose of the DataObject Class
    5. 9. Drawing on Graphs
      1. 9.1 Drawing on a Graph
        1. 9.1.1 Example: Overlaying a Regression Curve on a Scatter Plot
        2. 9.1.2 Graph Coordinate Systems and Drawing Regions
          1. 9.1.2.1 Drawing in the Coordinate System of the Data
          2. 9.1.2.2 Drawing on a Graph That Displays a Categorical Variable
        3. 9.1.3 Drawing in the Foreground and Background
        4. 9.1.4 Case Study: Adding a Prediction Band to a Scatter Plot
        5. 9.1.5 Practical Differences between the Coordinate Systems
      2. 9.2 Drawing Legends and Insets
        1. 9.2.1 Drawing a Legend
        2. 9.2.2 Drawing an Inset
      3. 9.3 Adjusting Graph Margins
      4. 9.4 A Module to Add Lines to a Graph
      5. 9.5 Case Study: A Module to Draw a Rug Plot on a Graph
      6. 9.6 Case Study: Plotting a Density Estimate
      7. 9.7 Case Study: Plotting a Loess Curve
      8. 9.8 Changing Tick Positions for a Date Axis
      9. 9.9 Case Study: Drawing Arbitrary Figures and Diagrams
      10. 9.10 A Comparison between Drawing in IMLPlus and PROC IML
    6. 10. Marker Shapes, Colors, and Other Attributes of Data
      1. 10.1 Overview of Data Attributes
      2. 10.2 Changing Marker Properties
        1. 10.2.1 Using Marker Shapes to Indicate Values of a Categorical Variable
        2. 10.2.2 Using Marker Colors to Indicate Values of a Continuous Variable
          1. 10.2.2.1 Color Representation in IMLPlus
          2. 10.2.2.2 Using Color to Mark Outliers
        3. 10.2.3 Coloring by Values of a Continuous Variable
      3. 10.3 Changing the Display Order of Categories
        1. 10.3.1 Setting the Display Order of a Categorical Variable
        2. 10.3.2 Using a Statistic to Set the Display Order of a Categorical Variable
      4. 10.4 Selecting Observations
      5. 10.5 Getting and Setting Attributes of Data
        1. 10.5.1 Properties of Variables
        2. 10.5.2 Attributes of Observations
  5. III. Applications
    1. 11. Calling Functions in the R Language
      1. 11.1 Overview of Calling Functions in the R Language
      2. 11.2 Introduction to the R Language
      3. 11.3 Calling R Functions from IMLPlus
      4. 11.4 Data Frames and Matrices: Passing Data to R
        1. 11.4.1 Transferring SAS Data to R
        2. 11.4.2 What Happens to the Data Attributes?
        3. 11.4.3 Transferring Data from R to SAS Software
      5. 11.5 Importing Complicated R Objects
      6. 11.6 Handling Missing Values
        1. 11.6.1 R Functions and Missing Values
        2. 11.6.2 Merging R Results with Data That Contain Missing Values
      7. 11.7 Calling R Packages
        1. 11.7.1 Installing a Package
        2. 11.7.2 Calling Functions in a Package
      8. 11.8 Case Study: Optimizing a Smoothing Parameter
        1. 11.8.1 Computing a Loess Smoother in R
        2. 11.8.2 Computing an AICC Statistic in R
        3. 11.8.3 Encapsulating R Statements into a SAS/IML Module
        4. 11.8.4 Finding the Best Smoother by Minimizing the AICC Statistic
        5. 11.8.5 Conclusions
      9. 11.9 Creating Graphics in R
      10. 11.10 References
    2. 12. Regression Diagnostics
      1. 12.1 Overview of Regression Diagnostics
      2. 12.2 Fitting a Regression Model
      3. 12.3 Identifying Influential Observations
      4. 12.4 Identifying Outliers and High-Leverage Observations
      5. 12.5 Examining the Distribution of Residuals
      6. 12.6 Regression Diagnostics for Models with Classification Variables
      7. 12.7 Comparing Two Regression Models
        1. 12.7.1 Comparing Analyses in Different Workspaces
        2. 12.7.2 Comparing Analyses in the Same Workspace
      8. 12.8 Case Study: Comparing Least Squares and Robust Regression Models
      9. 12.9 Logistic Regression Diagnostics
      10. 12.10 Viewing ODS Statistical Graphics
      11. 12.11 References
    3. 13. Sampling and Simulation
      1. 13.1 Overview of Sampling and Simulation
      2. 13.2 Simulate Tossing a Coin
      3. 13.3 Simulate a Coin-Tossing Game
        1. 13.3.1 Distribution of Outcomes
        2. 13.3.2 Compute Statistics for the Simulation
          1. 13.3.2.1 What is the expected number of coin tosses in a game?
          2. 13.3.2.2 What is the expected gain (loss) for playing the game?
        3. 13.3.3 Efficiency of the Simulation
      4. 13.4 Simulate Rolling Dice
      5. 13.5 Simulate a Game of Craps
        1. 13.5.1 A First Approach
        2. 13.5.2 A More Efficient Approach
      6. 13.6 Random Sampling with Unequal Probability
      7. 13.7 A Module for Sampling with Replacement
      8. 13.8 The Birthday Matching Problem
        1. 13.8.1 A Probability-Based Solution for a Simplified Problem
        2. 13.8.2 Simulate the Birthday Matching Problem
      9. 13.9 Case Study: The Birthday Matching Problem for Real Data
        1. 13.9.1 An Analysis of US Births in 2002
        2. 13.9.2 The Birthday Problem for People Born in 2002
        3. 13.9.3 The Matching Birth Day-of-the-Week Problem
          1. 13.9.3.1 A Probability-Based Estimate
          2. 13.9.3.2 A Simulation-Based Estimate
          3. 13.9.3.3 Compare the Probability-based and Simulation-Based Estimates
        4. 13.9.4 The 2002 Matching Birthday Problem
      10. 13.10 Calling C Functions from SAS/IML Studio
      11. 13.11 References
    4. 14. Bootstrap Methods
      1. 14.1 An Introduction to Bootstrap Methods
      2. 14.2 The Bootstrap Distribution for a Mean
        1. 14.2.1 Obtaining a Random Sample
        2. 14.2.2 Creating a Bootstrap Distribution
        3. 14.2.3 Computing Bootstrap Estimates
      3. 14.3 Comparing Two Groups
      4. 14.4 Using SAS Procedures in Bootstrap Computations
        1. 14.4.1 Resampling by Using the SURVEYSELECT Procedure
        2. 14.4.2 Computing Bootstrap Statistics with a SAS Procedure
      5. 14.5 Case Study: Bootstrap Principal Component Statistics
        1. 14.5.1 Plotting Confidence Intervals on a Scree Plot
        2. 14.5.2 Plotting the Bootstrap Distributions
      6. 14.6 References
    5. 15. Timing Computations and the Performance of Algorithms
      1. 15.1 Overview of Timing Computations
      2. 15.2 Timing a Computation
      3. 15.3 Comparing the Performance of Algorithms
        1. 15.3.1 Two Algorithms That Delete Missing Values
        2. 15.3.2 Performance as the Size of the Data Varies
        3. 15.3.3 Performance as Characteristics of the Data Vary
      4. 15.4 Replicating Timings: Measuring Mean Performance
      5. 15.5 Timing Algorithms in PROC IML
      6. 15.6 Tips for Timing Algorithms
      7. 15.7 References
    6. 16. Interactive Techniques
      1. 16.1 Overview of Interactive Techniques
      2. 16.2 Pausing a Program to Enable Interaction
      3. 16.3 Attaching Menus to Graphs
      4. 16.4 Linking Related Data
      5. 16.5 Dialog Boxes in SAS/IML Studio
        1. 16.5.1 Displaying Simple Dialog Boxes
        2. 16.5.2 Displaying a List in a Dialog Box
      6. 16.6 Creating a Dialog Box with Java
      7. 16.7 Creating a Dialog Box with R
        1. 16.7.1 The Main Idea
        2. 16.7.2 A First Modal Dialog Box
        3. 16.7.3 A Modal Dialog Box with a Checkbox
        4. 16.7.4 Case Study: A Modal Dialog Box for a Correlation Analysis
      8. 16.8 References
  6. IV. Appendixes
    1. A. Description of Data Sets
      1. A.1 Installing the Data
      2. A.2 Vehicles Data
      3. A.3 Movies Data
      4. A.4 Birthdays2002 Data
    2. B. SAS/IML Operators, Functions, and Statements
      1. B.1 Overview of the SAS/IML Language
      2. B.2 A Summary of Frequently Used SAS/IML Operators
      3. B.3 A Summary of Functions and Subroutines
        1. B.3.1 Mathematical Functions
        2. B.3.2 Probability Functions
        3. B.3.3 Descriptive Statistical Functions
        4. B.3.4 Matrix Query Functions
        5. B.3.5 Matrix Reshaping Functions
        6. B.3.6 Linear Algebra Functions
        7. B.3.7 Set Functions
        8. B.3.8 Formatting Functions
        9. B.3.9 Module Statements
        10. B.3.10 Control Statements
        11. B.3.11 Statements for Reading and Writing SAS Data Sets
        12. B.3.12 Options for Printing Matrices
    3. C. IMLPlus Classes, Methods, and Statements
      1. C.1 Overview of IMLPlus Classes, Methods, and Statements
      2. C.2 The DataObject Class
      3. C.3 The DataView Class
      4. C.4 The Plot Class
      5. C.5 Methods for Creating and Modifying Plots
        1. C.5.1 Bar Chart Methods
        2. C.5.2 Box Plot Methods
        3. C.5.3 Histogram Methods
        4. C.5.4 Line Plot Methods
        5. C.5.5 Scatter Plot Methods
      6. C.6 Calling SAS Procedures
      7. C.7 Calling R Functions
    4. D. Modules for Compatability with SAS/IML 9.22
      1. D.1 Overview of SAS/IML 9.22 Modules
      2. D.2 The Mean Module
      3. D.3 The Var Module
      4. D.4 The Qntl Module
    5. E. ODS Statements
      1. E.1 Overview of ODS Statements
      2. E.2 Finding the Names of ODS Tables
      3. E.3 Selecting and Excluding ODS Tables
      4. E.4 Creating Data Sets from ODS Tables
      5. E.5 Creating ODS Statistical Graphics