A confusion (or classification) matrix can help us qualify what threshold value to use by comparing the predicted outcomes against the actual outcomes as follows:
Predict y=0 (healthy) |
Predict y=1 (disease) |
|
Actual y=0 (healthy) |
True negatives (TN) |
False positives (FP) |
Actual y=1 (disease) |
False negatives (FN) |
True positives (TP) |
By generating a confusion matrix, it allows us to quantify the accuracy of our model based on a given threshold value by using the following series of metrics:
- N = number of observations
- Overall accuracy = (TN + TP) / N
- Overall error rate = (FP + FN) / N
- Sensitivity (True Positive Rate) = TP / (TP + FN)
- Specificity (True Negative Rate) = TN / (TN + FP)
- False positive ...