Cloud Dataflow runner services execute various data processing jobs that are created using the Dataflow SDK in a programming model that simplifies large-scale data processing.
We have our code programming model divided in four major components:
- Pipelines: Represents a single, repeatable job from start to finish
- PCollections: Represents a set of data in your pipeline
- Transforms: Performs processing on the elements of PCollection
- I/O Sources and Sinks: Provides data source / data sink APIs for pipeline I/O
Let's discuss them one by one in the following topics.