The main purpose of Cloud Dataflow is to provide real-time transformation and enrich data. It also serves the purpose of reducing complexity. This is yet another component with serverless capabilities and is fully managed.
When to use:
If you want to build a pipeline where you transform data (real-time or batch) before subjecting it to Cloud Pub/Sub, BigQuery, or Machine Learning.
Special features:
- Auto scales horizontally
- Unified Programming Model
- Automated Resource Management
Costing:
Please refer to the following table for details about the costing of Cloud Dataflow for the Iowa region:
Cloud Dataflow Worker Type | vCPU $/hr | Memory $ GB/hr | Storage—Standard Persistent Disk $ GB/hr | Storage—SSD Persistent Disk $ GB/hr |