About R

We've dabbled a little bit in Chapter 2, Access, Speed, and Storage with Hadoop, with R programming, but in this chapter, we now formally introduce R as the tool to perform our data profiling exercises as well as adding perspectives (establish context) for data to be used in visualizations.

R is a language and environment easy to learn, very flexible in nature, and also very focused on statistical computing thus making it great for manipulating, cleaning, summarizing, producing probability statistics, and so on (as well as actually creating visualizations with your data), so it's a great choice for the exercises required for profiling, establishing context, and identifying additional perspectives.

In addition, here are a few more reasons ...

Get Big Data Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.