Pipelines in Cloud Dataflow represent a data processing job, encapsulating entire series of computations. A pipeline supports input data from multiple external sources, is capable of transforming the data, and writes output data. Output data is typically written to an external data sink, which can be one of many GCP Data Storage services.
Dataflow can easily convert data from one format to another. A pipeline is built by writing a program using the Dataflow SDK.
Pipelines consists of two parts:
- Data: Specialized collection classes called PCollection
- Transforms: A step in your pipeline or a data processing operation