Video description
Spark is one of today’s most popular distributed computation engines for processing and analyzing big data. This course provides data engineers, data scientist and data analysts interested in exploring the technology of data streaming with practical experience in using Spark. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, the Tungsten execution engine, and more in this hands-on course where you’ll build small several applications that leverage all the aspects of Spark 2.0. While not a requirement, the course works best for those with some Scala experience.
- Understand the main features of Spark and its advantages over existing systems
- Learn the basics of parallelism, streaming computation, and Spark streaming
- Explore the distinctions between Spark Structured Streaming and legacy DStream APIs
- Understand how to write to and use the Spark Structured Streaming API
- Learn about the new Catalyst query optimizer and the Tungsten execution engine
- Discover how Scala and Spark Structured Streaming simplify distributed streaming tasks
- Gain hands-on experience building applications using Spark 2.0
Michael Li is the founder of The Data Incubator, which provides big data corporate training and a selective eight-week fellowship for PhDs transitioning into industry. Previously, he worked as a data scientist, software engineer, and researcher at Foursquare, Google, Andreessen Horowitz, J.P. Morgan, and NASA. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review. Michael earned his Ph.D. at Princeton and was a Marshall Scholar in Cambridge.
Table of contents
- Overview
- Spark Datasets and Structured Streaming
-
Spark Structured Streaming
- Spark Structured Streaming
- Netcat Socket Structured Streaming Example
- Socket Structured Streaming Example
- Spark Structured Streaming Parsing Data
- Constructing Columns in Structured Streaming
- Selecting and Filtering Columns Using Structured Streaming
- GroupBy and Aggregation in Structured Streaming
- Joining Structured Stream with Datasets
- SQL Queries in Spark Structured Streaming
-
DStream Comparison
- Comparing Structured Streaming with DStream
- Custom Receivers in Spark DStream
- Iterative Wordcount Using Spark DStream
- Cumulative Wordcount using Spark DStream
- Benefits of Spark Tungsten
- Tungsten Performance Benefit Demonstration
- Benefits of Spark Catalyst
- Viewing Query Plans in Spark Shell
- Visualizing Query Stages in Spark UI Viewer
- Viewing Spark Catalyst-Optimized Physical Plans
- Standalone Spark Streaming Applications
Product information
- Title: Mastering Spark for Structured Streaming
- Author(s):
- Release date: November 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491974438
You might also like
video
Streaming Big Data with Spark Streaming, Scala, and Spark 3!
In this course, you will learn the basics of the Scala programming language; learn how Apache …
book
Stream Processing with Apache Spark
Before you can build analytics tools to gain quick insights, you first need to know how …
video
Apache Spark Streaming with Python and PySpark
Spark Streaming is becoming incredibly popular, and with good reason. According to IBM, 90% of the …
video
Apache Spark with Scala - Learn Spark from a Big Data Guru
This course covers all the fundamentals of Apache Spark with Scala and teaches you everything you …