Understanding BY Groups
BY Groups with a Single BY Variable
The following figure represents the results of using a single BY variable, zipCode, in a
DATA step. The input data set,
zip contains street names, cities, states, and ZIP codes.
The groups are created by specifying the variable zipCode in the BY statement. The
DATA step arranges the zipcodes that have the same values into groups.
The figure shows five BY groups that are created from the examples Example Code 22.1
on page 461 and Example Code 22.2 on page 462..
The first BY group contains all observations with the smallest value for the BY variable
zipCode. The second BY group contains all observations with the next smallest value
for the BY variable, and so on.
Figure 22.1 BY Group Using a Single BY Variable (ZipCode)
Example Code 22.1 Create the Zip Data Set
data zip;
input zipCode State $ City $ Street $20-29;
datalines;
85730 AZ Tucson Domenic Ln
85730 AZ Tucson Gleeson Pl
33133 FL Miami Rice St
33133 FL Miami Thomas Ave
33133 FL Miami Surrey Dr
33133 FL Miami Trade Ave
33146 FL Miami Nervia St
Understanding BY Groups 461
33146 FL Miami Corsica St
33801 FL Lakeland French Ave
33809 FL Lakeland Egret Dr
;
You can then specify the BY variable in the DATA step using the following code:
Example Code 22.2 Sort and Group the zipCode Data Set by a Single Variable
proc sort data=zip;
by zipcode;
run;
data zip;
set zip; by zipcode;
run;
proc print data=zip noobs;
title 'BY-Group Uing a Single Variable: ZipCode';
run;
BY Groups with Multiple BY Variables
The following figure represents the results of processing the zip data set with two BY
variables, State and City. This example uses the same data set as in
Example Code 22.1
on page 461, and is arranged in an order that you can use with the following BY
statement:
by State City;
The figure shows three BY groups. The data set is shown with the BY variables State
and City printed on the left for easy reading. The position of the BY variables in the
observations does not affect how the values are grouped and ordered.
The observations are arranged so that the observations for Arizona occur first. The
observations within each value of State are arranged in order of the value of City. Each
BY group has a unique combination of values for the variables State and City. For
example, the BY value of the first BY group is AZ Tucson, and the BY value of the
second BY group is FL Lakeland.
462 Chapter 22 BY-Group Processing in the DATA Step
Figure 22.2 BY Groups with Multiple BY Variables (State and City)
Here is the code for creating the output shown in the figure Figure 22.2 on page 463 :
Example Code 22.3 Create the Zip Data Set
/* BY Groups with Multiple BY Variables */
data zip;
input State $ City $ Street $13-22 ZipCode ;
datalines;
FL Miami Nervia St 33146
FL Miami Rice St 33133
FL Miami Corsica St 33146
FL Miami Thomas Ave 33133
FL Miami Surrey Dr 33133
FL Miami Trade Ave 33133
FL Lakeland French Ave 33801
FL Lakeland Egret Dr 33809
AZ Tucson Domenic Ln 85730
AZ Tucson Gleeson Pl 85730
;
Example Code 22.4 Sort and Group the zipCode Data Set by Multiple BY Variables
proc sort data=zip;
by State City;
run;
data zip;
set zip;
by State City;
run;
proc print data=zip noobs;
title 'BY Groups with Multiple BY Variables: State City';
run;
Understanding BY Groups 463

Get SAS 9.4 Language Reference, 6th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.