O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Taming Big Data with Spark Streaming and Scala – Hands On!

Video Description

Process large amounts of data in real time using Spark Streaming

About This Video

  • Process streams of real-time data from various sources with Spark Streaming

  • Query your streaming data in real-time using Spark SQL

  • A comprehensive tutorial with practical examples to help you develop real-time Spark applications

  • In Detail

    Businesses these days require constant, real-time analysis of large amounts of data, along with meaningful insights which influence business decisions. Apache Spark has emerged as the most popular tool in the Big Data market for efficient real-time analytics of Big Data. Spanning over 5 hours, this course will teach you the basics of Apache Spark and how to use Spark Streaming - a module of Apache Spark which involves handling and processing of Big Data on a real-time basis. You will learn how to create Spark applications with Scala to process streams of real-time data. Whether you want to analyze continuously incoming website traffic, analyze real-time streams of Twitter feeds or query your streaming data in real time, this course has got you covered. You will also learn how to use the MLlib module of Spark to train machine learning models with streaming data, and use those models to make real-time predictions. The course assumes some programming experience, and uses Scala to develop Spark applications. It includes a crash course in the Scala programming language in case you're new to it.

    Table of Contents

    1. Chapter 1 : Getting Started
      1. Introduction and Getting Set Up 00:16:06
      2. Stream Live Tweets with Spark Streaming! 00:10:13
    2. Chapter 2 : A Crash Course in Scala
      1. Scala Basics – Part 1 00:11:27
      2. Scala Basics – Part 2 00:09:41
      3. Flow Control in Scala 00:07:18
      4. Functions in Scala 00:08:47
      5. Data Structures in Scala 00:16:38
    3. Chapter 3 : Spark Streaming Concepts
      1. Introduction to Spark 00:07:06
      2. The Resilient Distributed Dataset (RDD) 00:10:40
      3. RDDs in Action – Simple Word Count Application 00:08:17
      4. Introduction to Spark Streaming 00:06:32
      5. Revisiting the PrintTweets Application 00:05:10
      6. Windowing – Aggregating Data over Longer Time Spans 00:05:00
      7. Fault Tolerance in Spark Streaming 00:06:06
    4. Chapter 4 : Spark Streaming Examples with Twitter
      1. Saving Tweets to Disk 00:13:24
      2. Tracking the Average Tweet Length 00:08:23
      3. Tracking the Most Popular Hashtags 00:14:51
    5. Chapter 5 : Spark Streaming Examples with Clickstream / Apache Access Log Data
      1. Tracking the Top URLs Requested 00:13:27
      2. Alarming on Log Errors 00:11:56
      3. Integrating Spark Streaming with Spark SQL 00:08:27
      4. Intro to Structured Streaming in Spark 2 00:11:24
      5. Analyzing Apache Log files with Structured Streaming 00:09:05
    6. Chapter 6 : Integrating with Other Systems
      1. Integrating with Apache Kafka 00:12:20
      2. Integrating with Apache Flume 00:08:51
      3. Integrating with Amazon Kinesis 00:05:30
      4. Writing Custom Data Receivers 00:06:56
      5. Integrating with Cassandra 00:07:35
    7. Chapter 7 : Advanced Spark Streaming Examples
      1. Stateful Information in Spark Streams 00:15:07
      2. Streaming K-Means Clustering 00:15:36
      3. Streaming Linear Regression 00:11:50
    8. Chapter 8 : Spark Streaming in Production
      1. Running with spark-submit 00:10:47
      2. Packaging Your Code with SBT 00:10:49
      3. Running on a Real Hadoop Cluster with EMR 00:13:14
      4. Troubleshooting and Tuning Spark Jobs 00:12:35
    9. Chapter 9 : You Made It!
      1. Learning More 00:03:45