Chapter 16Categorical Data Analysis

Package(s): gdata

Dataset(s): UCBAdmissions, Titanic, HairEyeColor, VADeaths, faithful, atombomb, Filariasistype

16.1 Introduction

Discrete data may be classified into two forms: (i) nominal data, and (ii) ordinal data. Nominal data consists of variables which have labels. For example, the variable gender consists of two labels, male and female. As such, though we may denote males by 0 and females by 1, it is not the case here that 1 is greater than 0, and thus the name for the variable is a nominal variable. On the other hand, if we consider the rank of a student on the basis of marks, the first rank signifies more value than the second rank. Such variables are called ordinal variables. Categorical data analysis is concerned about analysis of these kind of variables.

Categorical Data Analysis, abbreviated as CDA, requires data to be entered in a specific format, viz., the contingency tables. In particular, in R, the data has to be read in a table format. Some of the standard datasets, for CDA, shipped along with R software include UCBAdmissions, Titanic, HairEyeColor, and VADeaths. Note that earlier datasets, such as iris, are of the class data.frame. The above-mentioned datasets are of the class table or matrix, as can be verified in the next (small) program.

> class(UCBAdmissions);class(Titanic);class(HairEyeColor)
> class(VADeaths)
[1] "table"
[1] "table"
[1] "table"
[1] "matrix"

We will begin with graphical methods for the categorical ...

Get A Course in Statistics with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.