Notebooks in this folder demonstrate how to run Dataproc Templates from Jupyter Notebooks using Vertex AI.
Recently, Google made Serverless Spark even more powerful, by enabling serverless interactive development through Dataproc Sessions in Jupyter notebooks, natively integrated with Vertex AI Workbench.
Additionally, a data scientist can automate a Dataproc Template execution with Vertex AI Pipelines and Serverless Spark Kubeflow components.
The best way to get started is to clone the Dataproc Templates repository to your Jupyter environment in Vertex AI, and run the notebook.
-
Enable Compute Engine API, Dataproc API, Vertex-AI API and Vertex Notebooks API in your GCP project.
-
Create a User-Managed Notebook in Vertex AI Workbench
In this example, a User-Managed notebook is created using the Compute Engine default service account.
-
Open the created notebook, clone the Dataproc Templates GitHub repository and run the desired notebook located in the /notebooks folder