Binning Data
Another common data transformation is to group a set of observations into bins based on the value of a specific variable. For example, suppose you had some time series data where time was measured in days, but you wanted to summarize the data by month. There are several functions available for binning numeric data in R.
Shingles
We briefly mentioned shingles in Shingles. Shingles are a way to represent intervals
in R. They can be overlapping, like roof shingles (hence the name). They
are used extensively in the lattice
package, when you want to use a numeric value as a conditioning
value.
To create shingles in R, use the shingle
function:
shingle(x, intervals=sort(unique(x)))
To specify where to separate the bins, use the intervals
argument. You can use a numeric
vector to indicate the breaks or a two-column matrix, where each row
represents a specific interval.
To create shingles where the number of observations is the same in
each bin, you can use the equal.count
function:
equal.count(x, ...)
Cut
The function cut
is
useful for taking a continuous variable and splitting it into discrete
pieces. Here is the default form of cut
for use with numeric vectors:
# numeric form cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...)
There is also a version of cut
for manipulating Date
objects:
# Date form cut(x, breaks, labels = NULL, start.on.monday = TRUE, right = FALSE, ...)
The cut
function takes a numeric vector as input and returns ...
Get R in a Nutshell, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.