O'Reilly logo

Streaming Systems by Reuven Lax, Slava Chernyak, Tyler Akidau

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Watermarks

So far we have been looking at stream processing from the perspective of the pipeline author or data scientist. Chapter 2Chapter 2 introduced watermarks as part of the answer to the fundamental questions of where in event time processing is taking place and when in processing time results are materialized. In this chapter, we approach the same questions, but instead from the perspective of the underlying mechanics of the stream processing system. Looking at these mechanics will help us motivate, understand, and apply the concepts around watermarks. We will discuss how watermarks are created at the point of data ingress, how they propagate through a data processing pipeline, and how they affect output timestamps. We will also demonstrate how watermarks preserve the guarantees that are necessary for answering the questions of where in event time data is processed and when it is materialized, while dealing with unbounded data.

Definition

Consider any pipeline ingesting data and outputting results continuously. We wish to solve the general problem of when it is safe to call an event time window closed - meaning the window does not expect any more data. To do so we would like to characterize the progress that the pipeline is making relative to its unbounded input.

One naive approach for solving the event-time windowing problem would be to simply base our event time windows on the current processing time. As we saw in Chapter 1, we quickly run into trouble - ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required