16

Proportion Data

An important class of problems involves data on proportions such as:

  • studies on percentage mortality,
  • infection rates of diseases,
  • proportion responding to clinical treatment,
  • proportion admitting to particular voting intentions,
  • sex ratios, or
  • data on proportional response to an experimental treatment.

What all these have in common is that we know how many of the experimental objects are in one category (dead, insolvent, male or infected) and we also know how many are in another (alive, solvent, female or uninfected). This contrasts with Poisson count data, where we knew how many times an event occurred, but not how many times it did not occur (p. 527).

We model processes involving proportional response variables in R by specifying a generalized linear model with family=binomial. The only complication is that whereas with Poisson errors we could simply specify family=poisson, with binomial errors we must give the number of failures as well as the numbers of successes in a two-vector response variable. To do this we bind together two vectors using cbind into a single object, y, comprising the numbers of successes and the number of failures. The binomial denominator, n, is the total sample, and

number.of.failures = binomial.denominator – number.of.successes
y <- cbind(number.of.successes, number.of.failures)

The old fashioned way of modelling this sort of data was to use the percentage mortality as the response variable. There are four problems with this: ...

Get The R Book now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.