Advantages and limitations

The R language has long been the lingua franca of data scientists. Its simple-to-understand DataFrame abstraction, expressive APIs, and vibrant package ecosystem are exactly what the analysts needed. The main challenge was with the scalability. SparkR bridges that gap by providing distributed in-memory DataFrames without leaving the R eco-system. Such a symbiotic relationship allows users to gain the following benefits:

  • There is no need for the analyst to learn a new language
  • The SparkR APIs are similar to R APIs
  • You can access SparkR from R studio, along with the autocomplete feature
  • Performing interactive, exploratory analysis of a very large dataset is no longer hindered by memory limitations or long turnaround times ...

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.