Data transformation and discretization

As we know from the previous section, there are always some data formats that are best suited for specific data mining algorithms. Data transformation is an approach to transform the original data to preferable data format for the input of certain data mining algorithms before the processing.

Data transformation

Data transformation routines convert the data into appropriate forms for mining. They're shown as follows:

  • Smoothing: This uses binning, regression, and clustering to remove noise from the data
  • Attribute construction: In this routine, new attributes are constructed and added from the given set of attributes
  • Aggregation: In this summary or aggregation, operations are performed on the data
  • Normalization

Get R: Data Analysis and Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.