Selecting data

When we are confronted with a large number of independent variables, we often run into the problem of which values to select. In addition, the variable might be binned, combined with other variables, or altered—all of which might make or break a particular model.

Collinearity

Collinearity is when we have multiple x variables that are highly related to each other; they have a high degree of correlation. When using regressions, you always have to be on the watch for collinearity as you can't be sure which individual variable really affects the outcome variable. Here is a classic example. Suppose you wanted to measure the happiness of a college student. You have the following input variables: age, sex, money available for beer, money ...

Get Mastering .NET Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.