-
Notifications
You must be signed in to change notification settings - Fork 5
Add jupyter #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add jupyter #87
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
9b77097
Add jupyter example - armada spark + client static mode
sudiptob2 9cfbf88
refactor jupyter example
sudiptob2 3ed2ae6
docs
sudiptob2 ef8c670
auth token support
sudiptob2 c45c27b
force setting SPARK_DRIVER_HOST
sudiptob2 6a07e0f
cleanup examples
sudiptob2 ee4a157
auto create queue
sudiptob2 cfb95c2
copy notebooks to workspace
sudiptob2 8d14f3e
conditional python installation in dockerfile
sudiptob2 6fe37b0
fail with message when jupyter not included
sudiptob2 d09141c
autometically load image in kind
sudiptob2 96b71b9
docs
sudiptob2 aeb8dd3
copy notebook examples only if it does not exists in the workspace
sudiptob2 de5b7a6
fix env variables for auth
sudiptob2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,3 +80,6 @@ scripts/armadactl | |
| e2e-test.log | ||
| extraJars/*.jar | ||
| scripts/.tmp/ | ||
|
|
||
| # Jupyter | ||
| example/jupyter/workspace/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| #!/bin/bash | ||
|
|
||
| cd /home/spark/workspace | ||
|
|
||
| exec jupyter notebook \ | ||
| --ip=0.0.0.0 \ | ||
| --port=8888 \ | ||
| --no-browser \ | ||
| --NotebookApp.token='' \ | ||
| --NotebookApp.password='' \ | ||
| --NotebookApp.notebook_dir=/home/spark/workspace | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,236 @@ | ||
| { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in our internal cluster, i had to make the following mods to the default notebook: |
||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "introduction", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Armada Spark Example\n", | ||
| "\n", | ||
| "This notebook demonstrates how to run Spark jobs on Armada using PySpark in client mode." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "imports", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import os\n", | ||
| "import glob\n", | ||
| "import subprocess\n", | ||
| "import random\n", | ||
| "from pyspark.sql import SparkSession\n", | ||
| "from pyspark import SparkConf" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "setup-section", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Setup\n", | ||
| "\n", | ||
| "Clean up any existing Spark context and configure the environment." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "stop-existing-context", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "try:\n", | ||
| " from pyspark import SparkContext\n", | ||
| " if SparkContext._active_spark_context:\n", | ||
| " SparkContext._active_spark_context.stop()\n", | ||
| "except:\n", | ||
| " pass" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "config-section", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Configuration\n", | ||
| "\n", | ||
| "Set up connection parameters and locate the Armada Spark JAR file." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "configuration", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Configuration\n", | ||
| "auth_token = os.environ.get('ARMADA_AUTH_TOKEN')\n", | ||
| "auth_script_path = os.environ.get('ARMADA_AUTH_SCRIPT_PATH')\n", | ||
| "driver_host = os.environ.get('SPARK_DRIVER_HOST')\n", | ||
| "driver_port = os.environ.get('SPARK_DRIVER_PORT', '7078')\n", | ||
| "block_manager_port = os.environ.get('SPARK_BLOCK_MANAGER_PORT', '10061')\n", | ||
| "armada_master = os.environ.get('ARMADA_MASTER', 'local://armada://host.docker.internal:30002')\n", | ||
| "armada_queue = os.environ.get('ARMADA_QUEUE', 'default')\n", | ||
| "armada_namespace = os.environ.get('ARMADA_NAMESPACE', 'default')\n", | ||
| "image_name = os.environ.get('IMAGE_NAME', 'spark:armada')\n", | ||
| "event_watcher_use_tls = os.environ.get('ARMADA_EVENT_WATCHER_USE_TLS', 'false')\n", | ||
| "\n", | ||
| "# Find JAR - try common Scala versions (2.12, 2.13)\n", | ||
| "jar_paths = glob.glob('/opt/spark/jars/armada-cluster-manager_2.1*-*-all.jar')\n", | ||
| "if not jar_paths:\n", | ||
| " raise FileNotFoundError(\"Armada Spark JAR not found!\")\n", | ||
| "armada_jar = jar_paths[0]\n", | ||
| "\n", | ||
| "# Generate app ID, required for client mode\n", | ||
| "app_id = f\"jupyter-spark-{subprocess.check_output(['openssl', 'rand', '-hex', '3']).decode().strip()}\"" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "spark-config-section", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Spark Configuration\n", | ||
| "\n", | ||
| "Configure Spark to use Armada as the cluster manager in client mode." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "spark-config", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Spark Configuration\n", | ||
| "conf = SparkConf()\n", | ||
| "if auth_token:\n", | ||
| " conf.set(\"spark.armada.auth.token\", auth_token)\n", | ||
| "if auth_script_path:\n", | ||
| " conf.set(\"spark.armada.auth.script.path\", auth_script_path)\n", | ||
| "if not driver_host:\n", | ||
| " raise ValueError(\n", | ||
| " \"SPARK_DRIVER_HOST environment variable is required. \"\n", | ||
| " )\n", | ||
| "conf.set(\"spark.master\", armada_master)\n", | ||
| "conf.set(\"spark.submit.deployMode\", \"client\")\n", | ||
| "conf.set(\"spark.app.id\", app_id)\n", | ||
| "conf.set(\"spark.app.name\", \"jupyter-spark-pi\")\n", | ||
| "conf.set(\"spark.driver.bindAddress\", \"0.0.0.0\")\n", | ||
| "conf.set(\"spark.driver.host\", driver_host)\n", | ||
| "conf.set(\"spark.driver.port\", driver_port)\n", | ||
| "conf.set(\"spark.driver.blockManager.port\", block_manager_port)\n", | ||
| "conf.set(\"spark.home\", \"/opt/spark\")\n", | ||
| "conf.set(\"spark.armada.container.image\", image_name)\n", | ||
| "conf.set(\"spark.armada.queue\", armada_queue)\n", | ||
| "conf.set(\"spark.armada.scheduling.namespace\", armada_namespace)\n", | ||
| "conf.set(\"spark.armada.eventWatcher.useTls\", event_watcher_use_tls)\n", | ||
| "conf.set(\"spark.kubernetes.file.upload.path\", \"/tmp\")\n", | ||
| "conf.set(\"spark.kubernetes.executor.disableConfigMap\", \"true\")\n", | ||
| "conf.set(\"spark.local.dir\", \"/tmp\")\n", | ||
| "conf.set(\"spark.jars\", armada_jar)\n", | ||
| "\n", | ||
| "# Network timeouts\n", | ||
| "conf.set(\"spark.network.timeout\", \"800s\")\n", | ||
| "conf.set(\"spark.executor.heartbeatInterval\", \"60s\")\n", | ||
| "\n", | ||
| "# Static mode - tune these values for your environment\n", | ||
| "conf.set(\"spark.executor.instances\", \"2\")\n", | ||
| "conf.set(\"spark.armada.driver.limit.memory\", \"1Gi\")\n", | ||
| "conf.set(\"spark.armada.driver.request.memory\", \"1Gi\")\n", | ||
| "conf.set(\"spark.armada.executor.limit.memory\", \"1Gi\")\n", | ||
| "conf.set(\"spark.armada.executor.request.memory\", \"1Gi\")" | ||
GeorgeJahad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "create-spark-session", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Create SparkSession\n", | ||
| "spark = SparkSession.builder.config(conf=conf).getOrCreate()\n", | ||
| "print(f\"SparkSession created\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "examples-section", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Examples\n", | ||
| "\n", | ||
| "Run Spark computations on the Armada cluster." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "spark-pi-calculation", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Spark Pi calculation\n", | ||
| "print(f\"Running Spark Pi calculation...\")\n", | ||
| "n = 10000\n", | ||
| "\n", | ||
| "def inside(p):\n", | ||
| " x, y = random.random(), random.random()\n", | ||
| " return x*x + y*y < 1\n", | ||
| "\n", | ||
| "count = spark.sparkContext.parallelize(range(0, n)).filter(inside).count()\n", | ||
| "pi = 4.0 * count / n\n", | ||
| "print(f\" Pi is approximately: {pi}\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "cleanup-section", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Cleanup\n", | ||
| "\n", | ||
| "Stop the Spark context to release resources. This will stop the executors in Armada." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "stop-spark-context", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Stop Spark context\n", | ||
| "print(\"Stopping Spark context...\")\n", | ||
| "spark.stop()\n", | ||
| "print(\"Spark context stopped successfully\")" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.10.12" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host port is already configurable using
JUPYTER_PORTenv variable. This is the internal Jupyter port, don't think we need to make this configurable.armada-spark/scripts/runJupyter.sh
Lines 54 to 56 in 779b8d6