Spark binaries

Please execute the following shell commands as a local Linux user to extract the Apache Spark binaries. In our case, we will be installing the Spark binaries into /opt:

> tar -xzf spark-2.3.2-bin-hadoop2.7.tgz -C /opt

The resultant Spark parent directory will have the following structure:

  • bin: Shell scripts for local Spark services, such as spark-submit
  • sbin: Shell scripts, including starting and stopping Spark services
  • conf: Spark configuration files
  • jars: Spark library dependencies
  • python: Spark's Python API, called PySpark
  • R: Spark's R API, called SparkR

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.