Combining SAS Data Sets: Basic Concepts
What You Need to Know Before Combining Information Stored in
Multiple SAS Data Sets
Many applications require input data to be in a specific format before the data can be
processed to produce meaningful results. The data typically comes from multiple sources
and might be in different formats. Therefore, you often, if not always, have to take
intermediate steps to logically relate and process data before you can analyze it or create
reports from it.
Application requirements vary, but there are common factors for all applications that
access, combine, and process data. Once you have determined what you want the output
to look like, you must perform the following tasks:
Determine how the input data is related.
Ensure that the data is properly sorted or indexed, if necessary.
Select the appropriate access method to process the input data.
Select the appropriate SAS tools to complete the task.
The Four Ways That Data Can Be Related
Data Relationship Categories
Relationships among multiple sources of input data exist when each of the sources
contains common data, either at the physical or logical level. For example, employee
data and department data could be related through an employee ID variable that shares
common values. Another data set could contain numeric sequence numbers whose
partial values logically relate it to a separate data set by observation number.
You must be able to identify the existing relationships in your data. This knowledge is
crucial for understanding how to process input data in order to produce desired results.
All related data fall into one of these four categories, characterized by how observations
relate among the data sets:
one-to-one
one-to-many
many-to-one
many-to-many
To obtain the results that you want, you should understand how each of these methods
combines observations, how each method treats duplicate values of common variables,
and how each method treats missing values or nonmatched values of common variables.
Some of the methods also require that you preprocess your data sets by sorting them or
by creating indexes. See the description of each method in “Combining SAS Data Sets:
Methods” on page 488.
One-to-One Relationship
In a one-to-one relationship, typically a single observation in one data set is related to a
single observation from another based on the values of one or more selected variables. A
478 Chapter 23 Reading, Combining, and Modifying SAS Data Sets
one-to-one relationship implies that each value of the selected variable occurs no more
than once in each data set. When you work with multiple selected variables, this
relationship implies that each combination of values occurs no more than once in each
data set.
In the following example, observations in data sets Salary and Taxes are related by
common values for EmployeeNumber.
Figure 23.1 One-to-One Relationship
SALARY
EmployeeNumber
Salary
TAXES
TaxBracket
1234 55000
1111 0.18
3333 72000
1234 0.28
4876 32000
3333 0.32
5489 17000
4222 0.18
4876 0.24
EmployeeNumber
One-to-Many and Many-to-One Relationships
A one-to-many or many-to-one relationship between input data sets implies that one data
set has at most one observation with a specific value of the selected variable, but the
other input data set can have more than one occurrence of each value. When you work
with multiple selected variables, this relationship implies that each combination of
values occurs no more than once in one data set. However, the combination can occur
more than once in the other data set. The order in which the input data sets are processed
determines whether the relationship is one-to-many or many-to-one.
In the following example, observations in data sets One and Two are related by common
values for variable A. Values of A are unique in data set One but not in data set Two.
Figure 23.2 One-to-Many Relationship
ONE
TWO
1
3
5
3
6
4
1
1
2
3
0
99
1 4 88
1
2
5
1
77
66
2 2 55
3 4 44
A E F
A B C
In the following example, observations in data sets One, Two, and Three are related by
common values for variable ID. Values of ID are unique in data sets One and Three but
not in Two. For values 2 and 3 of ID, a one-to-many relationship exists between
observations in data sets One and Two, and a many-to-one relationship exists between
observations in data sets Two and Three.
Combining SAS Data Sets: Basic Concepts 479

Get SAS 9.4 Language Reference, 6th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.