Chapter 10. Running Spark

This chapter covers

  • Spark runtime components
  • Spark cluster types
  • Job and resource scheduling
  • Configuring Spark
  • Spark web UI
  • Running Spark on the local machine

In previous chapters, we mentioned different ways to run Spark. In this and the next two chapters, we’ll discuss ways to set up a Spark cluster. A Spark cluster is a set of interconnected processes, usually running in a distributed manner on different machines. The main cluster types that Spark runs on are YARN, Mesos, and Spark standalone. Two other runtime options, local mode and local cluster mode, although the easiest and quickest methods of setting up Spark, are used mainly for testing purposes. The local mode is a pseudo-cluster running on a single ...

Get Spark in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.