Activity: Studying Correlated Variables

Steps for Completion:

  1. Make a subset of the loan dataset by using some of the following variables:
df3_1 <- df3[,c("funded_amnt","annual_inc","dti","inq_last_6mths",                "total_acc","total_pymnt_inv")]
  1. Use cor for the preceding loan data subset, and then choose two highly correlated variables in the loan dataset. Use pairs, as follows:
total_rec_prncp and total_pymnt_intfunded_amnt,total_pymnt_inv
  1. Make a scatterplot for the preceding pairs for grade A, then fit a linear regression model.
  2. Determine what are the correlations of the preceding pairs.

Outcome:

Answer to step 4: The correlations are as follows:

  1. 93%
  2. 85%

Get Applied Data Visualization with R and ggplot2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.