Introduction
What is data cleaning? In this book, we define data cleaning to include:
Making sure that the raw data were accurately entered into a computer readable file.
Checking that character variables contain only valid values.
Checking that numeric values are within predetermined ranges.
Checking if there are missing values for variables where complete data is necessary.
Checking for and eliminating duplicate data entries.
Checking for uniqueness of certain values, such as patient ID’s.
Checking for invalid date values.
Checking that an ID number is present in each of “n” files.
Verifying that more complex multi-file rules have been followed. For example, if an adverse event of type X occurs in one data set, you expect an observation with the same ...
Get Cody’s Data Cleaning Techniques Using SAS® Software now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.