Cross-validating our model

Now before we cheat and look at our answer key, let's see how well this solution does at predicting data it hasn't seen. To do this, I write the following fairly large test:

def final_model_cross_validation_test(): df = pandas.read_csv('./generated_data.csv') df['predicted_dependent_var'] = 25.6266 \ + 2.7083*df['ind_var_a'] \ - 1.5527*df['ind_var_b'] \ - 0.3917*df['ind_var_c'] \ - 0.2006*df['ind_var_e'] \ + 5.6450*df['ind_var_b'] * df['ind_var_c'] df['diff'] = (df['dependent_var'] - df['predicted_dependent_var']).abs() print df['diff'] print '===========' cv_df = pandas.read_csv('./generated_data_cv.csv') cv_df['predicted_dependent_var'] = 25.6266 \ + 2.7083*cv_df['ind_var_a'] \ - 1.5527*cv_df['ind_var_b'] \ - 0.3917*cv_df['ind_var_c'] ...

Get Test-Driven Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.