Skip to content

Conversation

@sudiptob2
Copy link
Collaborator

@sudiptob2 sudiptob2 marked this pull request as ready for review December 24, 2025 00:22
@GeorgeJahad
Copy link
Collaborator

I rushed you to create this PR, so we could demo it, and it was great for that, but now that the demo is over I think we need to make it more user-friendly. I made the following comments with that in mind.

@@ -0,0 +1,254 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our internal cluster, i had to make the following mods to the default notebook:

   {
    "cell_type": "code",
    "id": "configuration",
    "metadata": {},
    "outputs": [],
    "source": [
     "# Configuration\n",
-    "driver_host = os.environ.get('SPARK_DRIVER_HOST', '10.0.0.80')\n",
-    "driver_port = os.environ.get('SPARK_DRIVER_PORT', '7078')\n",
+    "\n",
+    "auth_token = os.environ.get('ARMADA_AUTH_TOKEN')\n",
+    "driver_host = os.environ.get('SPARK_DRIVER_HOST', '11.2.208.34')\n",
+    "driver_port = os.environ.get('SPARK_DRIVER_PORT', '10060')\n",
     "block_manager_port = os.environ.get('SPARK_BLOCK_MANAGER_PORT', '10061')\n",
-    "armada_master = os.environ.get('ARMADA_MASTER', 'local://armada://host.docker.internal:30002')\n",
-    "armada_queue = os.environ.get('ARMADA_QUEUE', 'test')\n",
-    "image_name = os.environ.get('IMAGE_NAME', 'spark:armada')\n",
+    "armada_master = os.environ.get('ARMADA_MASTER', 'armada://XXX:443')\n",
+    "armada_queue = os.environ.get('ARMADA_QUEUE', 'XXX')\n",
+    "image_name = os.environ.get('IMAGE_NAME', 'XXX/spark:armada')\n",
     "\n",
     "# Find JAR\n",
-    "jar_paths = glob.glob('/opt/spark/jars/armada-cluster-manager_2.13-*-all.jar')\n",
+    "jar_paths = glob.glob('/opt/spark/jars/armada-cluster-manager_2.12-*-all.jar')\n",
     "if not jar_paths:\n",
     "    raise FileNotFoundError(\"Armada Spark JAR not found!\")\n",
     "armada_jar = jar_paths[0]\n",
     "\n",
     "# Generate app ID, required for client mode\n",
-    "app_id = f\"armada-spark-{subprocess.check_output(['openssl', 'rand', '-hex', '3']).decode().strip()}\""
+    "app_id = f\"jupyter-spark-{subprocess.check_output(['openssl', 'rand', '-hex', '3']).decode().strip()}\""
    ]
   },
   {
@@ -97,7 +99,8 @@
   },
    "source": [
     "# Spark Configuration\n",
     "conf = SparkConf()\n",
+    "conf.set(\"spark.armada.auth.token\", auth_token)\n",
     "conf.set(\"spark.master\", armada_master)\n",
     "conf.set(\"spark.submit.deployMode\", \"client\")\n",
     "conf.set(\"spark.app.id\", app_id)\n",
@@ -114,7 +128,6 @@
     "conf.set(\"spark.driver.blockManager.port\", block_manager_port)\n",
     "conf.set(\"spark.home\", \"/opt/spark\")\n",
     "conf.set(\"spark.armada.container.image\", image_name)\n",
-    "conf.set(\"spark.armada.scheduling.nodeUniformity\", \"armada-spark\")\n",
     "conf.set(\"spark.armada.queue\", armada_queue)\n",
     "conf.set(\"spark.kubernetes.file.upload.path\", \"/tmp\")\n",
     "conf.set(\"spark.kubernetes.executor.disableConfigMap\", \"true\")\n",
@@ -127,15 +140,18 @@
     "\n",
     "# Static mode\n",
     "conf.set(\"spark.executor.instances\", \"2\")\n",
-    "conf.set(\"spark.armada.executor.limit.memory\", \"1Gi\")\n",
-    "conf.set(\"spark.armada.executor.request.memory\", \"1Gi\")"
+    "conf.set(\"spark.armada.driver.limit.memory\", \"10Gi\")\n",
+    "conf.set(\"spark.armada.driver.request.memory\", \"10Gi\")\n",
+    "conf.set(\"spark.armada.executor.limit.memory\", \"60Gi\")\n",
+    "conf.set(\"spark.armada.executor.request.memory\", \"60Gi\")\n",
+    "#print (conf.get(\"spark.armada.auth.token\"))"
    ]
   },


exec jupyter notebook \
--ip=0.0.0.0 \
--port=8888 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env var?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host port is already configurable using JUPYTER_PORT env variable. This is the internal Jupyter port, don't think we need to make this configurable.

docker run -d \
--name armada-jupyter \
-p ${JUPYTER_PORT}:8888 \


ENV SPARK_DIST_CLASSPATH=/opt/spark/coreJars/*

# Install Jupyter, PySpark, and Python dependencies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker file recieves the include_python arg. Probably we shouldn't include jupyter if python is not included.

ENV SPARK_HOME=/opt/spark
ENV PYSPARK_PYTHON=python3
ENV PYSPARK_DRIVER_PYTHON=python3
ENV PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.10.9.7-src.zip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does that version string come from? Can we just do

 py4j-*src.zip

?

-e ARMADA_QUEUE=${ARMADA_QUEUE} \
-e IMAGE_NAME=${IMAGE_NAME} \
-e ARMADA_AUTH_TOKEN=${ARMADA_AUTH_TOKEN:-} \
-v "$notebooks_dir:/home/spark/workspace/notebooks:ro" \
Copy link
Collaborator

@GeorgeJahad GeorgeJahad Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like the notebook will be read only.

what i was thinking would be good is if we have a workspace directory that has no checked in code.

If it doesn't contain the notebook file, this script copies that in.

But it doesn't touch it if it exists already.

Then, that directory gets mounted in docker.

@GeorgeJahad
Copy link
Collaborator

After your changes stabilize I'd also like this branch tested against the internal cluster.

@GeorgeJahad GeorgeJahad changed the title Add jupyter example - armada spark + client static mode Add jupyter Jan 8, 2026
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Copy link
Collaborator

@GeorgeJahad GeorgeJahad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@GeorgeJahad
Copy link
Collaborator

thanks @sudiptob2 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable running from the Spark shell or Jupyter notebooks on a devpod.

3 participants