Spark development status

Apache Spark has become the most currently active project in the Hadoop ecosystem in terms of the number of contributors by the end of 2015. Having started as a research project at UC Berkeley AMPLAB in 2009, Spark is still relatively young when compared to projects such as Apache Hadoop and is still in active development. There were three releases in the year 2015, from 1.3 through 1.5, packed with features such as DataFrames API, SparkR, and Project Tungsten respectively. Version 1.6 was released in early 2016 and included the new Dataset API and expansion of data science functionality. Spark 2.0 was released in July 2016, and this being a major release has a lot of new features and enhancements that deserve a section ...

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.