O'Reilly logo

Streaming Systems by Tyler Akidau, Reuven Lax, Slava Chernyak

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Exactly-once & side effects

We will now shift from discussing programming models and APIs to the systems that implement them. A model and API allows users to describe what they want to compute. Actually running the computation accurately at scale requires a system - usually a distributed system.

In this chapter we will focus on how an implementing system can correctly implement the Beam model to produce accurate results. Streaming systems often talk about “exactly-once processing” - ensuring that every record is processed exactly one time. We will explain what we mean by this, and how it might be implemented.

Google Cloud Dataflow is a hosted service that runs Beam pipelines. In this chapter, we will examine some techniques used by Dataflow to efficiently and accurately process records through Beam pipelines exactly once.

Why exactly once matters

It almost goes without saying that for many users, any risk of dropped records or data loss in their data-processing pipelines unacceptable. Even so, historically many general-purpose streaming systems made no guarantees about record processing - all processing was “best effort” only. Other systems provided at-least-once guarantees, ensuring that records were always processed at least once, but records might be duplicated (and thus result in inaccurate aggregations); in practice, many such at-least-once systems performed aggregations in memory, and thus their aggregations could still be lost when machines crashed.

As a result, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required