Structured Streaming

Structured Streaming, on the other hand, is a newer and highly optimized stream processing engine built on the Spark SQL engine in which streaming data can be stored and processed using Spark's Dataset/DataFrame API (see Chapter 1, The Big Data Ecosystem). As of Spark 2.3, Structured Streaming offers the ability to process data streams using both micro-batch processing, with latencies as low as 100 milliseconds, and continuous processing, with latencies as low as 1 millisecond (thereby providing true real-time processing). Structured Streaming works by modelling data streams as an unbounded table that is being continuously appended. When a transformation or other type of query is processed on this unbounded table, a results ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.