The classic model that underlies reliability stipulates that a rating of unit i, i =1, 2, … can be expressed as Xij = ξi + ij, where ξi is that unit’s “true value” (i.e., that free of rater error) and εij is error made by the jth independent rater sampled from the population of raters . The interrater reliability coefficient is defined as:
where σξ2 is the variance of the ξi in the population of interest and σX2 that of a single observation per subject. Thus, in a sense, reliability relates to the signal-to-noise ratio, where σξ2 relates to “signal” and σX2 combines “signal” and “noise.”
According to this definition, the reliability coefficient is zero if and only if subjects in the population are homogeneous in whatever X measures. This situation should almost never pertain when considering measures for use in randomized clinical trials. Consequently, testing the null hypothesis that ρ = 0 is virtually never of interest, although admittedly it is often observed in the research literature. Instead, the tasks of greatest interest to clinical research are (1) obtaining a confidence interval for ρ, (2) judging the adequacy of ρ, and (3) considering how to improve ρ.