Counting Missing Values
Problem
A set of observations is incomplete. You want to find out how much so.
Solution
Count the number of
NULL
values in the
set.
Discussion
Values can be missing from a set of observations for any number
of reasons: a test may not yet have been administered, something may
have gone wrong during the test that requires invalidating the
observation, and so forth. You can represent such observations in a
dataset as NULL
values to signify
that they’re missing or otherwise invalid, and then use summary
statements to characterize the completeness of the dataset.
If a table t
contains values
to be summarized along a single dimension, a simple summary will do to
characterize the missing values. Suppose that t
looks like this:
mysql>SELECT subject, score FROM t ORDER BY subject;
+---------+-------+
| subject | score |
+---------+-------+
| 1 | 38 |
| 2 | NULL |
| 3 | 47 |
| 4 | NULL |
| 5 | 37 |
| 6 | 45 |
| 7 | 54 |
| 8 | NULL |
| 9 | 40 |
| 10 | 49 |
+---------+-------+
COUNT(*)
counts the total
number of rows, and COUNT(score)
counts only the number of nonmissing scores. The difference between
the two values is the number of missing scores, and that difference in
relation to the total provides the percentage of missing scores. These
calculations are expressed as follows:
mysql>SELECT COUNT(*) AS 'n (total)',
->COUNT(score) AS 'n (nonmissing)',
->COUNT(*) - COUNT(score) AS 'n (missing)',
->((COUNT(*) - COUNT(score)) * 100) / COUNT(*) AS '% missing'
->FROM t;
+-----------+-----------------+-------------+-----------+ ...
Get MySQL Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.