Pipelines

Pipelines in Cloud Dataflow represent a data processing job, encapsulating entire series of computations. A pipeline supports input data from multiple external sources, is capable of transforming the data, and writes output data. Output data is typically written to an external data sink, which can be one of many GCP Data Storage services.

Dataflow can easily convert data from one format to another. A pipeline is built by writing a program using the Dataflow SDK.

Pipelines consists of two parts:

  • Data: Specialized collection classes called PCollection
  • Transforms: A step in your pipeline or a data processing operation

Get Cloud Analytics with Google Cloud Platform now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.