DATA REVIEW

During the course of the data review, inspect the database in its entirety, and generate a series of statistics and graphs.

1. Review quality assurance reports. Follow up on any discrepancies.
2. Calculate the minimum and maximum of all variables and compare against predetermined ranges. (Ideally, this would have been done at the time the data were collected.) Generate box and whisker plots with the same goal in mind.
3. Eliminate duplicates from the database.
4. Verify that data are recorded in correct physical units, and that calibration and dilution factors have been applied.
5. Characterize missing data. Problems arise in either of the following cases:
  • When the frequency of missing data is associated with the specific treatment or process that was employed.
  • When specific demographic(s) fail to complete or return survey forms, so that the remaining sample is no longer representative of the population as a whole.
6. For each variable, (a) compute a serial correlation to confirm that the observations are independent of one another, (b) create a four-plot as described in the next section.
OUTLIERS
Outliers—extreme values, either small or large, that are well separated from the main set of observations—are frequently detected during a DQA as they are easily spotted on a dot chart or a box-whiskers plot. But as they are not signs of poor data, they should not be eliminated from the database. Rather, they should be dealt with during the subsequent analyses. ...

Get Common Errors in Statistics (and How to Avoid Them), 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.