The following should be your flowchart when choosing Dataproc or Dataflow:
A table-based comparison of Dataproc versus Dataflow:
Workload |
Cloud Dataproc |
Cloud Dataflow |
Stream processing (ETL) |
No |
Yes |
Batch processing (ETL) |
Yes |
Yes |
Iterative processing and notebooks |
Yes |
No |
Machine learning with Spark ML |
Yes |
No |
Preprocessing for machine learning |
NO |
Yes (with Cloud ML Engine) |