O'Reilly logo
live online training icon Live Online training

Modern streaming architectures

Getting to real time streaming value with Kafka, Spark Streaming, Flink, Beam, and micro services

Ted Malaska

This course is designed to bring you up to speed on the most popular open-source options for stream processing: Kafka, Spark Streaming, Flink, Beam, and micro services. We will break down all these technologies then we will walk through real-world streaming use cases and see which tools make the best fit.

There will be live examples of multiple use cases, complete with code samples and walk-through, of all the above streaming frameworks.

What you'll learn-and how you can apply it

At the end of this live, online course, you'll understand:

  • how the following technologies work and how they compare to each other
  • Spark Streaming
  • Flink
  • Beam
  • Microservices

  • IoT use cases, including enrichment, windowing, and machine learning

And you'll be able to:

  • work with any of the above streaming frameworks
  • build real prototypes
  • map IoT use cases to proven architectures
  • make architecture decisions for IoT solutions

This training course is for you because...

You are a:

  • Data engineer seeking to widen your streaming options
  • Data architect building real-time data systems at scale
  • Product manager looking into IoT strategies and technical paths

Prerequisites

  • Programming experience in Python, Java, or another data-analysis language
  • Knowledge of fundamentals of data storage and access patterns
  • Interest in IoT use cases

Recommended Preparation:

Mastering Spark for Structured Streaming (video)

Taming Big Data with Spark Streaming and Scala – Hands On! (video)

Kafka (Learning Path)

Data Science With Apache Spark 2 (Learning Path)

Kafka: The Definitive Guide (book)

About your instructor

  • Ted is working on the Battle.net team at Blizzard, helping support great titles like World of Warcraft, Overwatch, HearthStone, and much more. Previously, he was a Principal Solutions Architect at Cloudera, helping clients succeed with Hadoop and the Hadoop ecosystem. Previously, he was a Lead Architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is also a co-author of O’Reilly “Hadoop Application Architectures” and a frequent speaker at many conferences, and a frequent blogger on data architectures.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

DAY ONE

  • What is streaming? (10 minutes)
  • Break down of streaming use cases (30 minutes)
  • Latencies (10 minutes)
  • Presenting results (10 minutes)
  • Break (10 minutes)
  • Understanding Kafka technologically (15 minutes)
  • Understanding Kafka’s place in a streaming framework (15 minutes)
  • Break down the components of a streaming framework (10 minutes)
  • High-level breakdown of Spark Streaming (12 minutes)
  • Break (10 minutes)
  • High-level breakdown of Spark Structured Streaming (12 minutes)
  • High-level breakdown of Flink (12 minutes)
  • High-level breakdown of Beam (12 minutes)
  • High-level breakdown of microservices (12 minutes)

DAY TWO

  • Use case: Enrichment (50 minutes) Implement example with Kafka, Spark Streaming, Flink, Beam, and microservices
  • Break (10 minutes)
  • Use case: Windowing (50 minutes) Implement example with Kafka, Spark Streaming, Flink, Beam, and microservices
  • Break (10 minutes)
  • Use case: Machine learning (50 minutes) Implement example with Kafka, Spark Streaming, Flink, Beam, and microservices