Chapter 7. Linear Models

Only one letter different from GBM, GLMs (generalized linear models) take a very different approach. Whereas decision trees are based on logic, and deep learning is a black box inspired by the human brain, GLMs are based on mathematics. The underlying idea is something you almost certainly did at school: make a scatterplot of data points on graph paper, then draw the best straight line through them. And perhaps you have used lm() in R or linear_model.LinearRegression in Python’s scikit-learn, or something similar, to have the computer do this for you. Once you progress beyond the graph paper you can apply it to any number of dimensions: each input column in training data counts as one dimension.

Sticking with school memories, when I first heard about Einstein’s general and special theories of relativity, I assumed the special theory was the complicated one, to handle some especially difficult things that the general-purpose one couldn’t deal with. It turns out the general theory was called that because it generalized both the special theory and some other stuff into one über-complicated theory. And so it is with generalized linear models: they can do your grandfather’s linear model (in fact, that is the default behavior), but they can also do other stuff.

That other stuff comes down to a couple of things: using link(y) = mx + c instead of y = mx + c (where link() is a function that allows introducing nonlinearity); and specifying the distribution of the ...

Get Practical Machine Learning with H2O now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.