Spark for data analytics

Soon after the Spark project was successful in the AMP labs, it was made open source in 2010 and transferred to the Apache Software Foundation in 2013. It is currently being led by Databricks.

Spark offers many distinct advantages over other distributed computing platforms, such as:

  • A faster execution platform for both iterative machine learning and interactive data analysis
  • Single stack for batch processing, SQL queries, real-time stream processing, graph processing, and complex data analytics
  • Provides high-level API to develop a diverse range of distributed applications by hiding the complexities of distributed programming
  • Seamless support for various data sources such as RDBMS, HBase, Cassandra, Parquet, MongoDB, HDFS, Amazon ...

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.