Chapter 17

Binary Response Variables

Many statistical problems involve binary response variables. For example, we often classify individuals as

  • dead or alive,
  • occupied or empty,
  • healthy or diseased,
  • wilted or turgid,
  • male or female,
  • literate or illiterate,
  • mature or immature,
  • solvent or insolvent, or
  • employed or unemployed.

It is interesting to understand the factors that are associated with an individual being in one class or the other. Binary analysis will be a useful option when at least one of your explanatory variables is continuous (rather than categorical). In a study of company insolvency, for instance, the data would consist of a list of measurements made on the insolvent companies (their age, size, turnover, location, management experience, workforce training, and so on) and a similar list for the solvent companies. The question then becomes which, if any, of the explanatory variables increase the probability of an individual company being insolvent.

The response variable contains only 0s and 1s; for example, 0 to represent dead individuals and 1 to represent live ones. Thus, there is only a single column of numbers for the response, in contrast to proportion data where two vectors (successes and failures) were bound together to form the response (see Chapter 16). The way that R treats binary data is to assume that the 0s and 1s come from a binomial trial with sample size 1. If the probability that an individual is dead is p, then the probability of obtaining y (where ...

Get The R Book, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.