BUILDING A SUCCESSFUL MODEL

“Rome was not built in one day,”3 nor was any reliable model. The only successful approach to modeling lies in a continuous cycle of hypothesis formulation (data gathering), hypothesis testing, and estimation. How you navigate through this cycle will depend on whether you are new to the field, have a small dataset in hand and are willing and prepared to gather more until the job is done, or you have access to databases containing hundreds of thousands of observations. The following prescription, while directly applicable to the latter case, can be readily modified to fit any situation.

1. A thorough literature search and an understanding of casual mechanisms is an essential prerequisite to any study. Do not let the software do your thinking for you.
2. Using a subset of the data selected at random, see which variables appear to be correlated with the dependent variable(s) of interest. (As noted in this and the preceding chapter, two unrelated variables may appear to be correlated by chance alone or as a result of confounding factors. For the same reasons, two closely related factors may fail to exhibit a statistically significant correlation.)
3. Use CART as a preliminary to regression when several categorical variables are involved. Early splits based on the values of categorical variables may suggest that multiple models need be developed, one for each block. For example, in deciding whether to purchase an item or how many items to purchase, women ...

Get Common Errors in Statistics (and How to Avoid Them), 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.