O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Apache Spark with Scala

Video Description

Get to grips with the fundamentals of Apache Spark for real-time Big Data processing

About This Video

  • Understand the fundamentals of Scala and the Apache Spark ecosystem

  • Handle large streams of data with Spark Streaming and perform Machine Learning in real time with Spark MLlib

  • Comprehensive tutorial packed with practical examples to help you develop real-world Big Data applications with Spark with Scala

  • In Detail

    With the rise in popularity of the term ‘Big Data’, there is an increasing need to process large amounts of data in real-time, with maximum efficiency. This has led to Apache Spark gaining popularity in the Big Data market very quickly. The Spark ecosystem allows you to process large streams of data in real-time. As Spark is built on Scala, knowledge of both has become vital for data scientists and data analysts today. This comprehensive 7 hour course will empower you to build efficient Spark applications to fulfill your Big Data needs.You will start with quickly understanding the basics of Scala and proceed to set up the development environment for Apache Spark and Scala for Big Data processing. You will understand the different modules of Spark like Spark SQL, Spark Streaming and GraphX, along with when and how to use them. While doing so, you will build practical, real-world Spark applications in Scala and see how you can deploy them on the cloud. You will also learn how to perform machine learning in real time using Spark’s MLlib module. Finally, you will learn how to run Spark on Hadoop clusters along with best practices and troubleshooting techniques.With over 20 carefully selected examples and abundant explanation to explain even the most difficult concepts, this course will ensure your success in taming your Big Data challenges using Spark with Scala.

    Table of Contents

    1. Chapter 1 : Getting Started
      1. Introduction and Getting Set Up 00:14:30
      2. [Activity] Create a Histogram of Real Movie Ratings with Spark! 00:12:58
    2. Chapter 2 : Scala Crash Course
      1. [Activity] Scala Basics, Part 1 00:12:52
      2. [Exercise] Scala Basics, Part 2 00:09:41
      3. [Exercise] Flow Control in Scala 00:07:18
      4. [Exercise] Functions in Scala 00:08:47
      5. [Exercise] Data Structures in Scala 00:16:38
    3. Chapter 3 : Spark Basics and Simple Examples
      1. Introduction to Spark 00:08:40
      2. The Resilient Distributed Dataset 00:11:04
      3. Ratings Histogram Walkthrough 00:07:33
      4. Spark Internals 00:04:42
      5. Key/Value RDDs and the Average Friends by Age example 00:12:21
      6. [Activity] Running the Average Friends by Age Example 00:07:58
      7. Filtering RDDs and the Minimum Temperature by Location Example 00:06:43
      8. [Activity] Running the Minimum Temperature Example and Modifying It for Maximum Temperature 00:10:11
      9. [Activity] Counting Word Occurrences Using flatmap() 00:08:59
      10. [Activity] Improving the Word Count Script with Regular Expressions 00:06:42
      11. [Activity] Sorting the Word Count Results 00:08:11
      12. [Exercise] Finding the Total Amount Spent by Customer 00:03:38
      13. [Exercise] Check your Results, and Sort Them by Total Amount Spent 00:04:26
      14. Check Your Results and Implementation against Mine 00:03:26
    4. Chapter 4 : Advanced Examples of Spark Programs
      1. [Activity] Find the Most Popular Movie 00:04:30
      2. [Activity] Use Broadcast Variables to Display Movie Names 00:08:53
      3. [Activity] Find the Most Popular Superhero in a Social Graph 00:14:10
      4. Superhero Degrees of Separation – Introducing Breadth-First Search 00:06:53
      5. Superhero Degrees of Separation – Accumulators and Implementing BFS in Spark 00:05:54
      6. Superhero Degrees of Separation – Review the Code, and Run It! 00:10:42
      7. Item-Based Collaborative Filtering in Spark, cache(), and persist() 00:08:17
      8. [Activity] Running the Similar Movies Script using Spark's Cluster Manager 00:14:13
      9. [Exercise] Improve the Quality of Similar Movies 00:02:42
    5. Chapter 5 : Running Spark on a Cluster
      1. [Activity] Using spark-submit to Run Spark Driver Scripts 00:06:59
      2. [Activity] Packaging Driver Scripts with SBT 00:14:07
      3. Introducing Amazon Elastic MapReduce 00:07:12
      4. Creating Similar Movies from One Million Ratings on EMR 00:12:47
      5. Partitioning 00:05:07
      6. Best Practices for Running on a Cluster 00:05:31
      7. Troubleshooting and Managing Dependencies 00:09:08
    6. Chapter 6 : SparkSQL, DataFrames, and DataSets
      1. Introduction to SparkSQL 00:07:08
      2. [Activity] Using SparkSQL 00:07:01
      3. [Activity] Using DataFrames and DataSets 00:06:38
      4. [Activity] Using DataSetsInstead of RDDs 00:07:24
    7. Chapter 7 : Machine Learning with MLLib
      1. Introducing MLLib 00:07:38
      2. [Activity] Using MLLib to Produce Movie Recommendations 00:07:23
      3. [Activity] Linear Regression with MLLib 00:11:37
      4. [Activity] Using DataFrames with MLLib 00:10:05
    8. Chapter 8 : Intro to Spark Streaming
      1. Spark Streaming Overview 00:09:54
      2. [Activity] Set Up a Twitter Developer Account, and Stream Tweets 00:12:13
      3. Structured Streaming 00:04:01
    9. Chapter 9 : Intro to GraphX
      1. GraphX, Pregel, and breadth-first search with Pregel. 00:10:39
      2. [Activity] Superhero Degrees of Separation using GraphX 00:08:59
    10. Chapter 10 : You Made It! Where to Go from Here?
      1. Learning More, and Career Tips 00:04:15