Chapter 52

The Kappa Index

Michael A. McIsaac and Richard J. Cook

52.1 Introduction

In medical research it is frequently of interest to examine the extent to which results of a classification procedure concur in successive applications. For example, two psychiatrists may separately examine each member of a group of patients and categorize each one as psychotic, neurotic, suffering from a personality disorder, or healthy. Given the resulting data, questions may then be posed regarding the ratings of the two psychiatrists and their relationship to one another. The psychiatrists would typically be said to exhibit a high degree of agreement if a high percentage of their ratings concurred and poor agreement if they often made different diagnoses. In general, this latter outcome could arise if the categories were ill-defined, the criteria for assessment were different for the two psychiatrists, or their ability to examine these criteria differed sufficiently, possibly as a result of different training or experience. Poor empirical agreement might therefore lead to a review of the category definitions and diagnostic criteria, or possibly retraining with a view to improving agreement and hence consistency of diagnoses and treatment.

In another context, one might have data from successive applications of a test for dysplasia or cancer from cervical smears. If the test indicates normal, mild, moderate, or severe dysplasia or cancer, and the test is applied at two time points in close proximity, ...

Get Methods and Applications of Statistics in Clinical Trials, Volume 2: Planning, Analysis, and Inferential Methods now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.