Chapter 15. Distributions and Modeling

Using summary statistics and plots to understand data is great, but they have their limitations. Statistics don’t give you the shape of data, and plots aren’t scalable to many variables (with more than five or six, things start to get confusing),[58] nor are they scalable in number (since you have to physically look at each one). Neither statistics nor plots are very good at giving you predictions from the data.

This is where models come in: if you understand enough about the structure of the data to be able to run a suitable model, then you can pass quantitative judgments about the data and make predictions.

There are lots of different statistical models, with more being invented as fast as university statistics departments can think of them. In order to avoid turning into a statistics course, this chapter is just going to deal with some really simple regression models. If you want to learn some stats, I recommend The R Book or Discovering Statistics Using R, both of which explain statistical concepts in glorious slow motion.

Before we get to running any models, we need a bit of background on generating random numbers, different kinds of distributions, and formulae.

Chapter Goals

After reading this chapter, you should:

  • Be able to generate random numbers from many distributions
  • Be able to find quantiles and inverse quantiles from those distributions
  • Know how to write a model formula
  • Understand how to run, update, and plot a linear regression ...

Get Learning R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.