DATA Step Processing Time
DATA step processing time occurs in two stages: the first is the start-up (or compilation
time), and the second is the execution time. The compilation time is the time that it takes
the SAS compiler to scan the SAS source code and convert it to an executable program.
The execution time is the time that it takes SAS to execute the DATA step for each
observation in a SAS file. The two phases do not occur simultaneously: that is, the
DATA step compiles first and then it executes. For more detailed information about these
two phases, see “The Compilation Phase” on page 414 and “The Execution Phase” on
page 414.
Understanding these processing times and how they relate to the structure of your SAS
programs might be helpful when you are looking for ways to improve performance. In
general, the more statements a DATA step processes, the longer the compilation time.
Alternatively, DATA steps processing large numbers of observations tend to have longer
execution times because they are more I/O-intensive.
For example, a very large DATA step job that is not I/O-intensive (that is, it has to
process a relatively small number of observations) might need to be rewritten to reduce
complexity and to eliminate repetitive and unused code. DO loops and user-defined
functions created with PROC FCMP are methods available for reducing compilation
time by decreasing the amount of code that has to be compiled. For more information
about how improve performance when running CPU-intensive programs, see
“Techniques for Optimizing CPU Performance” on page 204.
If most of the time used by the DATA step is for processing hundreds of observations,
then other techniques designed to optimize I/O might be more useful. For more
information about how to improve performance when running I/O-intensive programs,
see “Techniques for Optimizing I/O” on page 197.
Several SAS system options provide information that can help you minimize processing
time and optimize performance. For example, the FULLSTIMER option in SAS collects
and displays performance statistics on each DATA step so that you can determine which
resources were used for each step of data processing. For more information about this
option and about optimization in general, see Chapter 12, “Optimizing System
Performance,” on page 195.
The following example shows how to estimate the compilation time for a very large
DATA step job that has a small number of observations. The program uses the
DATETIME function with the %PUT macro statement to calculate the compilation start
time. It then uses the _N_ automatic variable to find the execution start time (SAS
always sets this variable to 1 at the start of the execution phase). By calculating the
difference between the two times, the program returns the total compilation time of the
DATA step.
Example Code 20.1 Finding Compilation and Execution Time
options nosource;
%put Starting compilation of DATA step: %QSYSFUNC(DATETIME(), DATETIME20.3);
%let startTime=%QSYSFUNC(DATETIME());
data a;
if _N_ = 1 then do;
endTime = datetime();
put 'Starting execution of
DATA step: ' endTime:DATETIME20.3;
DATA Step Processing Time 437
timeDiff=endTime-&startTime;
put 'The Compile time for this DATA Step is
approximately ' timeDiff:time20.6;
end;
/* Lots of DATA step code */
run;
Output 20.5 Log Output for Finding Compilation and Execution Time
Note: Macro statements and macro variables are resolved at compilation time and have
no bearing on the time it takes to execute the DATA step. For information about how
SAS processes statements with Macro activity, see “Getting Started with the Macro
Facility” in SAS Macro Language: Reference, and “SAS Programs and Macro
Processing” in SAS Macro Language: Reference.
438 Chapter 20 DATA Step Processing

Get SAS 9.4 Language Reference, 6th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.