O'Reilly logo

Streaming Systems by Reuven Lax, Slava Chernyak, Tyler Akidau

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Exactly-once & side effects

We will now shift from discussing programming models and APIs to the systems that implement them. A model and API allows users to describe what they want to compute. Actually running the computation accurately at scale requires a system - usually a distributed system.

In this chapter we will focus on how an implementing system can correctly implement the Beam model to produce accurate results. Streaming systems often talk about “exactly-once processing” - ensuring that every record is processed exactly one time. We will explain what we mean by this, and how it might be implemented.

As a motivating example, this chapter will focus on techniques used by Google Cloud Dataflow to efficiently guarantee exactly-once processing of record. Towards the end of the chapter, we will also look at techniques used by some other popular streaming systems to guarantee exactly once.

Why exactly once matters

It almost goes without saying that for many users, any risk of dropped records or data loss in their data-processing pipelines unacceptable. Even so, historically many general-purpose streaming systems made no guarantees about record processing - all processing was “best effort” only. Other systems provided at-least-once guarantees, ensuring that records were always processed at least once, but records might be duplicated (and thus result in inaccurate aggregations); in practice, many such at-least-once systems performed aggregations in memory, and thus ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required