Count Data and Poisson Errors

Count data have a number of properties that need to be considered during modelling:

  • count data are bounded below (you cannot have counts less than zero);
  • variance is not constant (variance increases with the mean);
  • errors are not normally distributed;
  • the fact that the data are whole numbers (integers) affects the error distribution.

It is very simple to deal with all these issues by using a GLM. All we need to write is

glm(y~x,poisson)

and the model is fitted with a log link (to ensure that the fitted values are bounded below) and Poisson errors (to account for the non-normality).

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.