Skip to content

Latest commit

 

History

History
29 lines (16 loc) · 1.71 KB

File metadata and controls

29 lines (16 loc) · 1.71 KB

Dataproc Templates (Jupyter Notebooks)

Notebooks in this folder demonstrate how to run Dataproc Templates from Jupyter Notebooks using Vertex AI.

Overview

Recently, Google made Serverless Spark even more powerful, by enabling serverless interactive development through Dataproc Sessions in Jupyter notebooks, natively integrated with Vertex AI Workbench.

Additionally, a data scientist can automate a Dataproc Template execution with Vertex AI Pipelines and Serverless Spark Kubeflow components.

Deploying Dataproc Templates to Vertex AI

The best way to get started is to clone the Dataproc Templates repository to your Jupyter environment in Vertex AI, and run the notebook.

  1. Enable Compute Engine API, Dataproc API, Vertex-AI API and Vertex Notebooks API in your GCP project.

  2. Create a User-Managed Notebook in Vertex AI Workbench

    workbench

    In this example, a User-Managed notebook is created using the Compute Engine default service account.

  3. Open the created notebook, clone the Dataproc Templates GitHub repository and run the desired notebook located in the /notebooks folder

    clone