Appendix A. Installing Spark

This appendix covers

  • The quickest ways to get started in Spark
  • Using virtual machines (VMs) to run Spark
  • Using Amazon Web Services / Elastic Map/Reduce to run Spark

Using Spark typically means first having 1) Hadoop installed and 2) a cluster of machines to run them on. The simplest scenario is if you’re doing GraphX work for your job and your job already has a Hadoop/Spark cluster set up that you can use. If that’s not the case, this appendix is for you. It describes various options where you don’t necessarily need either Hadoop or a cluster of machines.

The three options described in this appendix are as follows:

1.  On a local virtual machine—Cloudera QuickStart VM (with Hadoop and Spark preinstalled ...

Get Spark GraphX in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.