Spark DataFrames within the RStudio IDE

Another simple way to start your Spark connections and browse your datasets is with the RStudio IDE. After you've installed the sparklyr package, it'll appear in the top-right part of your RStudio window, close to your R environment. If you aren't connected to Spark, it'll look like the following screenshot. If you are connected, call spark_disconnect_all() before continuing, so we'll be on the same page:

Figure 12.1: Spark shown in RStudio IDE

Click on the left arrow to see all connections, then click on the new connection button to establish a connection. A window will pop up where you can connect ...

Get Hands-On Data Science with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.