Add jupyter #87

sudiptob2 · 2025-12-18T16:17:57Z

GeorgeJahad · 2026-01-06T20:14:17Z

I rushed you to create this PR, so we could demo it, and it was great for that, but now that the demo is over I think we need to make it more user-friendly. I made the following comments with that in mind.

example/jupyter/docker/Dockerfile

GeorgeJahad · 2026-01-06T20:29:02Z

example/jupyter/notebooks/jupyter_armada_spark.ipynb

@@ -0,0 +1,254 @@
+{


in our internal cluster, i had to make the following mods to the default notebook:

{ "cell_type": "code", "id": "configuration", "metadata": {}, "outputs": [], "source": [ "# Configuration\n", - "driver_host = os.environ.get('SPARK_DRIVER_HOST', '10.0.0.80')\n", - "driver_port = os.environ.get('SPARK_DRIVER_PORT', '7078')\n", + "\n", + "auth_token = os.environ.get('ARMADA_AUTH_TOKEN')\n", + "driver_host = os.environ.get('SPARK_DRIVER_HOST', '11.2.208.34')\n", + "driver_port = os.environ.get('SPARK_DRIVER_PORT', '10060')\n", "block_manager_port = os.environ.get('SPARK_BLOCK_MANAGER_PORT', '10061')\n", - "armada_master = os.environ.get('ARMADA_MASTER', 'local://armada://host.docker.internal:30002')\n", - "armada_queue = os.environ.get('ARMADA_QUEUE', 'test')\n", - "image_name = os.environ.get('IMAGE_NAME', 'spark:armada')\n", + "armada_master = os.environ.get('ARMADA_MASTER', 'armada://XXX:443')\n", + "armada_queue = os.environ.get('ARMADA_QUEUE', 'XXX')\n", + "image_name = os.environ.get('IMAGE_NAME', 'XXX/spark:armada')\n", "\n", "# Find JAR\n", - "jar_paths = glob.glob('/opt/spark/jars/armada-cluster-manager_2.13-*-all.jar')\n", + "jar_paths = glob.glob('/opt/spark/jars/armada-cluster-manager_2.12-*-all.jar')\n", "if not jar_paths:\n", " raise FileNotFoundError(\"Armada Spark JAR not found!\")\n", "armada_jar = jar_paths[0]\n", "\n", "# Generate app ID, required for client mode\n", - "app_id = f\"armada-spark-{subprocess.check_output(['openssl', 'rand', '-hex', '3']).decode().strip()}\"" + "app_id = f\"jupyter-spark-{subprocess.check_output(['openssl', 'rand', '-hex', '3']).decode().strip()}\"" ] }, { @@ -97,7 +99,8 @@ }, "source": [ "# Spark Configuration\n", "conf = SparkConf()\n", + "conf.set(\"spark.armada.auth.token\", auth_token)\n", "conf.set(\"spark.master\", armada_master)\n", "conf.set(\"spark.submit.deployMode\", \"client\")\n", "conf.set(\"spark.app.id\", app_id)\n", @@ -114,7 +128,6 @@ "conf.set(\"spark.driver.blockManager.port\", block_manager_port)\n", "conf.set(\"spark.home\", \"/opt/spark\")\n", "conf.set(\"spark.armada.container.image\", image_name)\n", - "conf.set(\"spark.armada.scheduling.nodeUniformity\", \"armada-spark\")\n", "conf.set(\"spark.armada.queue\", armada_queue)\n", "conf.set(\"spark.kubernetes.file.upload.path\", \"/tmp\")\n", "conf.set(\"spark.kubernetes.executor.disableConfigMap\", \"true\")\n", @@ -127,15 +140,18 @@ "\n", "# Static mode\n", "conf.set(\"spark.executor.instances\", \"2\")\n", - "conf.set(\"spark.armada.executor.limit.memory\", \"1Gi\")\n", - "conf.set(\"spark.armada.executor.request.memory\", \"1Gi\")" + "conf.set(\"spark.armada.driver.limit.memory\", \"10Gi\")\n", + "conf.set(\"spark.armada.driver.request.memory\", \"10Gi\")\n", + "conf.set(\"spark.armada.executor.limit.memory\", \"60Gi\")\n", + "conf.set(\"spark.armada.executor.request.memory\", \"60Gi\")\n", + "#print (conf.get(\"spark.armada.auth.token\"))" ] },

example/jupyter/notebooks/jupyter_armada_spark.ipynb

GeorgeJahad · 2026-01-08T05:17:06Z

docker/jupyter-entrypoint.sh

+
+exec jupyter notebook \
+     --ip=0.0.0.0 \
+     --port=8888 \


The host port is already configurable using JUPYTER_PORT env variable. This is the internal Jupyter port, don't think we need to make this configurable.

armada-spark/scripts/runJupyter.sh

Lines 54 to 56 in 779b8d6

docker run -d \

--name armada-jupyter \

-p ${JUPYTER_PORT}:8888 \

GeorgeJahad · 2026-01-08T05:36:29Z

docker/Dockerfile


 ENV SPARK_DIST_CLASSPATH=/opt/spark/coreJars/*

+# Install Jupyter, PySpark, and Python dependencies


The docker file recieves the include_python arg. Probably we shouldn't include jupyter if python is not included.

GeorgeJahad · 2026-01-08T05:38:22Z

docker/Dockerfile

+ENV SPARK_HOME=/opt/spark
+ENV PYSPARK_PYTHON=python3
+ENV PYSPARK_DRIVER_PYTHON=python3
+ENV PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.10.9.7-src.zip


where does that version string come from? Can we just do

py4j-*src.zip

?

GeorgeJahad · 2026-01-08T05:41:30Z

scripts/runJupyter.sh

+  -e ARMADA_QUEUE=${ARMADA_QUEUE} \
+  -e IMAGE_NAME=${IMAGE_NAME} \
+  -e ARMADA_AUTH_TOKEN=${ARMADA_AUTH_TOKEN:-} \
+  -v "$notebooks_dir:/home/spark/workspace/notebooks:ro" \


this looks like the notebook will be read only.

what i was thinking would be good is if we have a workspace directory that has no checked in code.

If it doesn't contain the notebook file, this script copies that in.

But it doesn't touch it if it exists already.

Then, that directory gets mounted in docker.

GeorgeJahad · 2026-01-08T05:52:32Z

After your changes stabilize I'd also like this branch tested against the internal cluster.

Signed-off-by: Sudipto Baral <[email protected]>

GeorgeJahad

lgtm

GeorgeJahad · 2026-01-28T17:24:15Z

thanks @sudiptob2 !

sudiptob2 marked this pull request as ready for review December 24, 2025 00:22

GeorgeJahad reviewed Jan 6, 2026

View reviewed changes

example/jupyter/docker/Dockerfile Outdated Show resolved Hide resolved

GeorgeJahad reviewed Jan 6, 2026

View reviewed changes

sudiptob2 commented Jan 7, 2026

View reviewed changes

example/jupyter/notebooks/jupyter_armada_spark.ipynb Show resolved Hide resolved

GeorgeJahad reviewed Jan 8, 2026

View reviewed changes

GeorgeJahad changed the title ~~Add jupyter example - armada spark + client static mode~~ Add jupyter Jan 8, 2026

sudiptob2 added 13 commits January 26, 2026 10:52

Add jupyter example - armada spark + client static mode

9b77097

Signed-off-by: Sudipto Baral <[email protected]>

refactor jupyter example

9cfbf88

Signed-off-by: Sudipto Baral <[email protected]>

docs

3ed2ae6

Signed-off-by: Sudipto Baral <[email protected]>

auth token support

ef8c670

Signed-off-by: Sudipto Baral <[email protected]>

force setting SPARK_DRIVER_HOST

c45c27b

Signed-off-by: Sudipto Baral <[email protected]>

cleanup examples

6a07e0f

Signed-off-by: Sudipto Baral <[email protected]>

auto create queue

ee4a157

Signed-off-by: Sudipto Baral <[email protected]>

copy notebooks to workspace

cfb95c2

Signed-off-by: Sudipto Baral <[email protected]>

conditional python installation in dockerfile

8d14f3e

Signed-off-by: Sudipto Baral <[email protected]>

fail with message when jupyter not included

6fe37b0

Signed-off-by: Sudipto Baral <[email protected]>

autometically load image in kind

d09141c

Signed-off-by: Sudipto Baral <[email protected]>

docs

96b71b9

Signed-off-by: Sudipto Baral <[email protected]>

copy notebook examples only if it does not exists in the workspace

aeb8dd3

Signed-off-by: Sudipto Baral <[email protected]>

sudiptob2 force-pushed the jupyter-demo branch from 3e3909e to aeb8dd3 Compare January 26, 2026 15:53

GeorgeJahad approved these changes Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add jupyter #87

Add jupyter #87

sudiptob2 commented Dec 18, 2025

Uh oh!

GeorgeJahad commented Jan 6, 2026

Uh oh!

Uh oh!

GeorgeJahad Jan 6, 2026

Uh oh!

Uh oh!

GeorgeJahad Jan 8, 2026

Uh oh!

sudiptob2 Jan 8, 2026

Uh oh!

GeorgeJahad Jan 8, 2026

Uh oh!

GeorgeJahad Jan 8, 2026

Uh oh!

GeorgeJahad Jan 8, 2026 •

edited

Loading

Uh oh!

GeorgeJahad commented Jan 8, 2026

Uh oh!

GeorgeJahad left a comment

Uh oh!

GeorgeJahad commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	docker run -d \
	--name armada-jupyter \
	-p ${JUPYTER_PORT}:8888 \


		ENV SPARK_DIST_CLASSPATH=/opt/spark/coreJars/*

		# Install Jupyter, PySpark, and Python dependencies

Add jupyter #87

Are you sure you want to change the base?

Add jupyter #87

Conversation

sudiptob2 commented Dec 18, 2025

Uh oh!

GeorgeJahad commented Jan 6, 2026

Uh oh!

Uh oh!

GeorgeJahad Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GeorgeJahad Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

sudiptob2 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

GeorgeJahad Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

GeorgeJahad Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

GeorgeJahad Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GeorgeJahad commented Jan 8, 2026

Uh oh!

GeorgeJahad left a comment

Choose a reason for hiding this comment

Uh oh!

GeorgeJahad commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GeorgeJahad Jan 8, 2026 •

edited

Loading