Introduction

What is data cleaning? In this book, we define data cleaning to include:

  • Making sure that the raw data were accurately entered into a computer readable file.

  • Checking that character variables contain only valid values.

  • Checking that numeric values are within predetermined ranges.

  • Checking if there are missing values for variables where complete data is necessary.

  • Checking for and eliminating duplicate data entries.

  • Checking for uniqueness of certain values, such as patient ID’s.

  • Checking for invalid date values.

  • Checking that an ID number is present in each of “n” files.

  • Verifying that more complex multi-file rules have been followed. For example, if an adverse event of type X occurs in one data set, you expect an observation with the same ...

Get Cody’s Data Cleaning Techniques Using SAS® Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.