Improving Performance of WHERE Processing
Indexing a SAS data set can significantly improve the performance of WHERE
processing. An index is an optional file that you can create for SAS data files in order to
provide direct access to specific observations.
Processing a WHERE expression without an index requires SAS to sequentially read
observations in order to find the ones that match the selection criteria. Without an index,
SAS first checks for the sort indicator, which is stored with the data file from a previous
SORT procedure or SORTEDBY= data set option. If the sort indicator is validated, SAS
takes advantage of it and stops reading the file once it is clear there are no more values
that satisfy the WHERE expression. For example, consider a data set that is sorted by
Age, without an index. To process the expression where age le 25, SAS stops
reading observations after it finds an observation that is greater than 25. Note that while
SAS can determine when to stop reading observations, without an index, there is no
indication where to begin, so SAS always begins with the first observation, which can
require reading a lot of observations.
Having an index enables SAS to determine which observations satisfy the criteria, which
is referred to as optimizing the WHERE expression. However, by default, SAS decides
whether to use the index or read the entire data set sequentially. For details about how
SAS uses an index to process a WHERE expression, see “Using an Index for WHERE
Processing” on page 659.
In addition to creating indexes for the data set, here are some guidelines for writing
efficient WHERE expressions:
Table 11.5 Constructing Efficient WHERE Expressions
Guideline Efficient Inefficient
Avoid using the LIKE
operator that begins with %
or _.
where country like
'A%INA';
where country like
'%INA';
Avoid using arithmetic
expressions.
where salary >
48000;
where salary >
12*4000;
Processing a Segment of Data That Is
Conditionally Selected
Applying FIRSTOBS= and OBS= Options
When you conditionally select a subset of observations with a WHERE expression, you
can also segment that subset by applying FIRSTOBS=, OBS=, or both processing (as
data set options and system options). When used with a WHERE expression,
190 Chapter 11 WHERE-Expression Processing

Get SAS 9.4 Language Reference, 6th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.