5 Tests for categorical data

5.1 Introduction

In real applications several phenomena are represented by categorical variables. When data are not numerical some specific testing problems may arise and the categorical nature of data implies the application of specific testing procedures.

In addition to the classical classification of problems according to the number of compared samples and the presence of one or more dependent response variables, we can also distinguish between problems for binary or nonbinary variables, depending on the number of observable categories (two or more). Hence the data of the problems presented in this chapter can be represented in the following ways:

  • {Xi; i = 1, …, n} (univariate one-sample problem where n is the sample size);
  • {Xji; i = 1, …, njj = 1, 2} (univariate two-sample problem where nj is the size of the jth sample);
  • {Xih; i = 1, …, n; h = 1, …, q} (q-variate one-sample problem where n is the sample size).

The support of the variables is always represented by a set of two or more categories. When the categories are ordered (e.g. categorical judgments, educational level, age groups, etc.) the responses are said to be ordered categorical; otherwise they are said to be nominal categorical.

In multisample and/or bivariate problems, categorical data can be presented through contingency tables because the sample observations can be classified according to two factors. The levels of one factor correspond to the rows of the table; the levels ...

Get Nonparametric Hypothesis Testing: Rank and Permutation Methods with Applications in R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.