13

MULTIPLE REGRESSION

In the previous chapter on ANOVA, we had an outcome variable, joint strength, and one or more input variables, antimony, and cooling method. We were concerned with the question, “Is there a statistically significant effect of either input variable on the outcome variable?”

We might also be interested in this question, “How much does the outcome change as you change the input variables?”

After completing this chapter, you should be able to

  • distinguish between regression for explanation and regression for prediction,
  • see how regression models the relationship between a supposed outcome variable, and supposed predictor variables, but that the mathematics of regression does not prove any causation,
  • use visual exploration data to determine how appropriate a linear model would be (and for what range of the data),
  • perform a multiple linear regression,
  • interpret coefficients and their p-values,
  • use R2 to measure of how well the regression model fits the data,
  • use root mean squared error (RMSE) as a measure of prediction error,
  • explain the procedure of partitioning the data into a training sample (to fit the model) and a holdout sample (to see how well the model predicts new data),
  • use resampling to establish confidence intervals for regression coefficients,
  • use regression for the purpose of predicting unknown values, using the Tayko data,
  • describe how binary and categorical variables are used in regression, how the issue of multicollinearity arises, why it may be ...

Get Introductory Statistics and Analytics: A Resampling Perspective now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.