Binning Data
Another common data transformation is to group a set of observations into bins based on the value of a specific variable. For example, suppose that you had some time series data where time was measured in days, but you wanted to summarize the data by month. There are several functions available for binning numeric data in R.
Shingles
We briefly mentioned shingles in Shingles. Shingles are a way to represent
intervals in R. They can be overlapping, like roof shingles (hence
the name). They are used extensively in the lattice
package, when you want to use a
numeric value as a conditioning value.
To create shingles in R, use the shingle
function:
shingle(x, intervals=sort(unique(x)))
To specify where to separate the bins, use the intervals
argument. You can use a numeric
vector to indicate the breaks or a two-column matrix, where each row
represents a specific interval.
To create shingles where the number of observations is the
same in each bin, you can use the equal.count
function:
equal.count(x, ...)
Cut
The function cut
is useful for
taking a continuous variable and splitting it into discrete pieces.
Here is the default form of cut
for use with numeric vectors:
# numeric form cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...)
There is also a version of cut for manipulating
Date
objects:
# Date form cut(x, breaks, labels = NULL, start.on.monday = TRUE, right = FALSE, ...)
The cut
function takes a numeric vector as input and ...
Get R in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.