You are previewing Statistical Programming in SAS®.
O'Reilly logo
Statistical Programming in SAS®

Book Description

In Statistical Programming in SAS, author A. John Bailer integrates SAS tools with interesting statistical applications and uses SAS 9.2 as a platform to introduce programming ideas for statistical analysis, data management, and data display and simulation. Written using a reader-friendly and narrative style, the book includes extensive examples and case studies to present a well-structured introduction to programming issues.

This book has two parts. The first part addresses the nuts and bolts of programming, including fostering good programming habits, getting external data sets into SAS to construct an analysis data set, generating basic descriptive statistical summaries, producing customized tables, generating more attractive output, and producing high-quality graphical displays. The second part emphasizes programming in the context of a DATA step, in macros, and in SAS/IML software.

Examples of statistical methods and concepts not always encountered in basic statistics courses (for example, bootstrapping, randomization tests, and jittering) are used to illustrate programming ideas. This book provides extensive illustrations of the new ODS Statistical Graphics procedures in SAS, a description of the new ODS Graphics Editor, and a brief introduction to some of the capabilities of SAS/IML Studio, such as producing dynamically linked data displays and invoking R from SAS.

Table of Contents

  1. Copyright
  2. About This Book
    1. Purpose
    2. Intended Audience
    3. Scope of This Book
    4. Use in a Statistical Programming Class
    5. Belief and Style
    6. Software Used for This Book
    7. Code Used in This Book
    8. SAS Author Page
    9. Additional Resources
    10. Comments or Questions?
    11. SAS Publishing News
  3. Acknowledgments
  4. 1. The Basics—Including Some Nuts and Bolts
    1. 1. Let's Get Started—Preliminaries and a SAS Quick Start
      1. 1.1 Statistical Computing versus Programming versus Managing Data
      2. 1.2 Good Programming Practice
        1. 1.2.1 Document your programs!
        2. 1.2.2 Use meaningful variable names
        3. 1.2.3 DON'T USE ONLY CAPITALS IN PROGRAM STATEMENTS (although some judicious use is reasonable)
        4. 1.2.4 Indent program statements that naturally go together
      3. 1.3 SAS Program Structure
      4. 1.4 What Is a SAS Data Set?
      5. 1.5 Internally Documenting SAS Programs
      6. 1.6 Summary
    2. 1.7 References
      1. 1.8 Exercises
    3. 2. Reading, Combining, and Managing Data for Later Analysis
      1. 2.1 Temporary versus Permanent Status of Data Sets
      2. 2.2 Reading Data into a SAS Data Set
        1. 2.2.1 Reading data directly as part of a program–anyone for datalines?
        2. 2.2.2 Reading data sets saved as text–INFILE can be your friend
        3. 2.2.3 Sometimes variables are in particular columns or in particular formats
        4. 2.2.4 Reading comma-separated values–text files with comma delimiters
        5. 2.2.5 Reading Excel spreadsheets directly
        6. 2.2.6 Reading SPSS data files–the little SPSS engine that could
      3. 2.3 Writing Out a File or Making a Simple Report
      4. 2.4 Concatenating Data Sets and Adding Observations
      5. 2.5 Merging Data Sets and Adding Variables
      6. 2.6 Database Processing with PROC SQL
      7. 2.7 Summary
    4. 2.8 References
      1. 2.9 Exercises
      2. 2.10 Self-Study Laboratory Explorations
    5. 3. Using SAS Procedures
      1. 3.1 SAS System Options
      2. 3.2 Statements That Can Modify the Output of Most Procedures
      3. 3.3 Defining Your Own Formats for Variable Values
      4. 3.4 Selecting or Stratifying an Analysis by Values of a Variable
      5. 3.5 Displaying Data Set Properties and Observations
      6. 3.6 Using PROC PRINT to List the Observations in a Data Set
      7. 3.7 Basic Graphical Displays
      8. 3.8 Using Scatter Plots to Display Relationships between Numeric Variables
        1. 3.8.1 Comparing distributions of responses between multiple groups
      9. 3.9 Summarizing Categorical Variables
      10. 3.10 Summarizing Numeric Variables
        1. 3.10.1 PROC MEANS for descriptive statistics
      11. 3.11 Selecting a Simple Random Sample
      12. 3.12 Randomly Assigning Treatments to Observations
      13. 3.13 Summary
    6. 3.14 References
      1. 3.15 Exercises
    7. 4. Complex Table Construction and Output Control
      1. 4.1 Introducing PROC TABULATE
      2. 4.2 Building from Simple Specifications
      3. 4.3 Enhancing PROC TABULATE Output
      4. 4.4 Using the Output Delivery System
        1. 4.4.1 Basic ideas
        2. 4.4.2 Destinations—RTF, HTML, PDF, and more!
        3. 4.4.3 What's produced and how to select it
        4. 4.4.4 Another destination that stat programmers should visit—OUTPUT
      5. 4.5 Summary
    8. 4.6 References
      1. 4.7 Exercises
    9. 5. Basic Models in SAS
      1. 5.1 Overview of Modeling
      2. 5.2 Linear Regression Models
        1. 5.2.1 Motorboats and manatees—a look at simple linear regression
        2. 5.2.2 Big brains and big bodies—specifying and fitting a multiple regression model
      3. 5.3 ANOVA Models—PROC GLM for a One-Way ANOVA
        1. 5.3.1 Comparing bacterial growth under different packaging conditions using a one-way ANOVA model
      4. 5.4 ANOVA Models—PROC GLM for an ANOVA Model with Two or More Factors
      5. 5.5 Summary
    10. 5.6 References
      1. 5.7 Exercises
    11. 6. Producing Statistical Graphics in SAS
      1. 6.1 Graphics in SAS
      2. 6.2 ODS Statistical Graphics
      3. 6.3 Modifying Graphics Using the ODS Graphics Editor
      4. 6.4 Graphing with Styles and Templates
      5. 6.5 Statistical Graphics—Entering the Land of SG Procedures
      6. 6.6 Case Study: Using the SG Procedures
      7. 6.7 Summary
    12. 6.8 References
      1. 6.9 Exercises
    13. 7. Traditional SAS Graphics
      1. 7.1 Traditional SAS Graphics
      2. 7.2 Customizing Graphics
      3. 7.3 Why You Need to Learn about Annotate Data Sets
      4. 7.4 Case Study: Comparing Distributions of Responses
      5. 7.5 Descriptive Displays of Spatial Data
      6. 7.6 Summary
    14. 7.7 References
      1. 7.8 Exercises
      2. 7.9 Appendix: Complete Ohio County Population Data Set
  5. 2. Doing More with Programming
    1. 8. Formatting Variables, Recoding Variables, and Writing Programs
      1. 8.1 Internal Representations and Output Displays
        1. 8.1.1 Defining your own formats and informats
      2. 8.2 Character, Numeric, Time, and Date Formats
      3. 8.3 Recoding and Transforming Variables in a DATA Step
      4. 8.4 Ordering How Tasks Are Done
      5. 8.5 What Goes and What Stays in a Data Set
      6. 8.6 Structured Thinking about Writing Programs
      7. 8.7 Case Study 1: Is the Two-Sample t-Test Robust Enough for Heterogeneous Variances?
        1. 8.7.1 Case Study 1: Is the Two-Sample t-Test Robust Enough for Heterogeneous Variances? (Revisited Using More DATA Step Programming)
      8. 8.8 Case Study 2: Monte Carlo Integration to Estimate an Integral
      9. 8.9 Case Study 3: Simple Percentile-Based Bootstrap
      10. 8.10 Throw Out Your Tables of Statistical Distributions
      11. 8.11 Generating Variables Using Random Number Generators
      12. 8.12 Summary
    2. 8.13 References
      1. 8.14 Exercises
    3. 9. Programming in a DATA Step
      1. 9.1 Storage Bins for Collections of Values
        1. 9.1.1 Defining values in the variable list
        2. 9.1.2 Inputting values in the variable list
        3. 9.1.3 Reassign missing value codes for numeric variables "."
        4. 9.1.4 Recoding missing values for all numeric and character variables
        5. 9.1.5 Creating multiple observations from a single record
      2. 9.2 Case Study 1: Monte Carlo p-Value for Test of Spatial Randomness
      3. 9.3 Remembering Variable Values across Observations
        1. 9.3.1 Processing multiple observations for a single observation
      4. 9.4 Case Study 2: Randomization Test for the Equality of Two Populations
      5. 9.5 Summary
    4. 9.6 References
      1. 9.7 Exercises
    5. 10. Macro Programming
      1. 10.1 What Is a Macro and Why Would You Use It?
      2. 10.2 Motivation for Macros: Numerical Integration to Determine P(0<Z<1.645)
      3. 10.3 Processing Macros
      4. 10.4 Macro Variables, Parameters, and Functions
      5. 10.5 Conditional Execution, Looping, and Macros
      6. 10.6 Debugging Macro Code and Programs
      7. 10.7 Saving Macros
      8. 10.8 Functions and Routines for Macros
      9. 10.9 Bonus Material: Processing Multiple Data Sets
      10. 10.10 Summary
    6. 10.11 References
      1. 10.12 Exercises
    7. 11. Programming with Matrices and Vectors
      1. 11.1 Defining a Matrix and Subscripting
      2. 11.2 Using Diagonal Matrices and Stacking Matrices
      3. 11.3 Using Elementwise Operations, Repeating, and Multiplying Matrices
      4. 11.4 Importing a Data Set into SAS/IML and Exporting Matrices from SAS/IML to a Data Set
        1. 11.4.1 Creating matrices from SAS data sets and vice versa
      5. 11.5 Case Study 1: Monte Carlo Integration to Estimate π
      6. 11.6 Case Study 2: Bisection Root Finder
      7. 11.7 Case Study 3: Randomization Test Using Matrices Imported from PROC PLAN
      8. 11.8 Case Study 4: SAS/IML Module to Implement Monte Carlo Integration to Estimate π
        1. 11.8.1 Storing and loading SAS/IML modules
      9. 11.9 Introducing SAS/IML Studio
        1. 11.9.1 Case Study 1: Dynamic and interactive analysis of the SMSA data set
        2. 11.9.2 Case Study 2: Multiple-linked graphics windows
        3. 11.9.3 Case Study 3: SAS/IML matrix manipulations and invocations of SAS/STAT procedures
        4. 11.9.4 Case Study 4: Generating bootstrap CIs for mean AGE using R
      10. 11.10 Summary
    8. 11.11 References
      1. 11.12 Exercises