Apache Spark

Apache Spark is a fast and general engine for large-scale data processing. It was originally developed in 2009 in UC Berkeley's AMPLab and open sourced in 2010.

The main features of Spark are as follows:

  • Speed: Spark enables applications in Hadoop clusters to run up to 100x faster in memory and 10x faster even when running on disk.
  • Ease of use: Spark lets you quickly write applications in Java, Scala, or Python. You can use it interactively to query big datasets from the Scala and Python shells.
  • Runs everywhere: Spark runs on Hadoop, Mesos, in standalone mode, or in the cloud. It can access diverse data sources, including HDFS, Cassandra, HBase, and S3. You can run Spark readily using its standalone cluster mode, on EC2, or run it on ...

Get YARN Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.