[[Google Cloud Platform|GCP]] offers a variety of managed services for data pipelines. - **Dataprep** is used for data preparation and cleaning prior to analytics and visualization tasks. Think of it like Tableau Prep. - **Dataflow** is used for data batching and streaming in the middle of a data pipeline. - **Dataproc** is used for big data processing across multiple compute clusters with Apache [[Spark]] and [[Hadoop]]. Dataproc integrates with [[Vertex AI]] and common interfaces such as [[Jupyter Notebook]]. - **Pub/Sub** is used for streaming analytics and data integration pipelines to ingest and distribute data. Pub/Sub is commonly used to distribute change events between databases.