Chapter 7. Can You Simplify That? – Data Reduction Techniques

In this chapter, we will cover:

  • Performing cluster analysis using K-means clustering
  • Performing cluster analysis using hierarchical clustering
  • Reducing dimensionality with principal component analysis

Introduction

When confronted with large datasets, either in terms of the number of cases or the number of variables, or both, analysts often seek to reduce the complexity. They can use cluster analysis to condense the number of cases to a manageable number of representative points, or they may use principal component analysis (PCA) to identify a smaller set of variables or dimensions that capture the information content of most of the larger set of original variables. This chapter will cover ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.