In the previous chapter on ANOVA, we had an outcome variable, joint strength, and one or more input variables, antimony, and cooling method. We were concerned with the question, “*Is there* a statistically significant effect of either input variable on the outcome variable?”

We might also be interested in this question, “*How much* does the outcome change as you change the input variables?”

After completing this chapter, you should be able to

- distinguish between regression for explanation and regression for prediction,
- see how regression models the relationship between a supposed outcome variable, and supposed predictor variables, but that the mathematics of regression does not
*prove*any causation, - use visual exploration data to determine how appropriate a linear model would be (and for what range of the data),
- perform a multiple linear regression,
- interpret coefficients and their
*p*-values, - use
*R*^{2}to measure of how well the regression model fits the data, - use root mean squared error (RMSE) as a measure of prediction error,
- explain the procedure of partitioning the data into a training sample (to fit the model) and a holdout sample (to see how well the model predicts new data),
- use resampling to establish confidence intervals for regression coefficients,
- use regression for the purpose of predicting unknown values, using the Tayko data,
- describe how binary and categorical variables are used in regression, how the issue of multicollinearity arises, why it may be ...

Start Free Trial

No credit card required