Dataproc Templates (Jupyter Notebooks)

Vertex AI Pipelines - PySpark

Notebooks in this folder demonstrate how to run Dataproc Templates from Jupyter Notebooks using Vertex AI.

Overview

Recently, Google made Serverless Spark even more powerful, by enabling serverless interactive development through Dataproc Sessions in Jupyter notebooks, natively integrated with Vertex AI Workbench.

Additionally, a data scientist can automate a Dataproc Template execution with Vertex AI Pipelines and Serverless Spark Kubeflow components.

Deploying Dataproc Templates to Vertex AI

The best way to get started is to clone the Dataproc Templates repository to your Jupyter environment in Vertex AI, and run the notebook.

Enable Compute Engine API, Dataproc API, Vertex-AI API and Vertex Notebooks API in your GCP project.
Create a User-Managed Notebook in Vertex AI Workbench

In this example, a User-Managed notebook is created using the Compute Engine default service account.
Open the created notebook, clone the Dataproc Templates GitHub repository and run the desired notebook located in the /notebooks folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dataproc Templates (Jupyter Notebooks)

Overview

Deploying Dataproc Templates to Vertex AI

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dataproc Templates (Jupyter Notebooks)

Overview

Deploying Dataproc Templates to Vertex AI