Get started with Spark for data processing and data science
Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.
This Learning Path begins with an introduction to Apache Spark. We first cover the basics of Spark, introduce SparkR, then look at the charting and plotting features of Python in conjunction with Spark data processing, and finally Spark's data processing libraries. We then develop a real-world Spark application. Next, we enable you to become comfortable and confident working with Spark for data science by exploring Spark's data science libraries on a dataset of tweets.
Begin your journey into fast, large-scale, and distributed data processing using Spark with this Learning Path.
Prerequisites: Requires basic knowledge of either Python or R
Resources: Code downloads and errata:
This path navigates across the following products (in sequential order):
Apache Spark 2 for Beginners (5h 38m)
Data Science with Spark (3h 20m)