[[Google Cloud Platform|GCP]] offers a variety of managed services for data pipelines.
- **Dataprep** is used for data preparation and cleaning prior to analytics and visualization tasks. Think of it like Tableau Prep.
- **Dataflow** is used for data batching and streaming in the middle of a data pipeline.
- **Dataproc** is used for big data processing across multiple compute clusters with Apache [[Spark]] and [[Hadoop]]. Dataproc integrates with [[Vertex AI]] and common interfaces such as [[Jupyter Notebook]].
- **Pub/Sub** is used for streaming analytics and data integration pipelines to ingest and distribute data. Pub/Sub is commonly used to distribute change events between databases.