Checking for Duplicates

In Chapter 5, you used PROC SORT with the NODUP and NODUPKEY options to detect duplicates, as well as a DATA step approach using the FIRST, and LAST, temporary variables. Yes, you guessed it, there is an SQL answer also. If you have a GROUP BY statement in your PROC SQL and follow it with a COUNT function, you can count the frequency of each level of the GROUP BY variable. (COUNT is the SQL name for the N and FREQ functions in the SAS System that count the number of nonmissing arguments.) If you choose patient number (PATNO) as the grouping variable, the COUNT function will tell you how many observations there are per patient. Remember to use a HAVING clause when you use summary functions such as COUNT. Look at the SQL ...

Get Cody’s Data Cleaning Techniques Using SAS® Software now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.