14Proportion Data

An important class of problems involves data on proportions:

  • studies on percentage mortality
  • infection rates of diseases
  • proportion responding to clinical treatment
  • proportion admitting to particular voting intentions
  • sex ratios
  • data on proportional response to an experimental treatment

These are count data, but what they have in common is that we know how many of the experimental objects are in one category (dead, insolvent, male or infected) and we also know how many are in another (alive, solvent, female or uninfected). This differs from Poisson count data, where we knew how many times an event occurred, but not how many times it did not occur (Chapter 13).

We model processes involving proportional response variables in R by specifying a generalized linear model (GLM) with family=binomial. The only complication is that whereas with Poisson errors we could simply say family=poisson, with binomial errors we must specify the number of failures as well as the numbers of successes by creating a two-vector response variable. To do this we bind together two vectors using cbind into a single object, y, comprising the numbers of successes and the number of failures. The binomial denominator, n, is the total sample, and

number.of.failures <- binomial.denominator – number.of.successes
y <- cbind(number.of.successes, number.of.failures)

The old-fashioned way of modelling this sort of data was to use the percentage mortality as the response variable. There are four ...

Get Statistics: An Introduction Using R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.