Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Spark operator with Jupyterhub #2180

Open
1 task done
InzamamAnwar opened this issue Sep 18, 2024 · 1 comment
Open
1 task done

Integrate Spark operator with Jupyterhub #2180

InzamamAnwar opened this issue Sep 18, 2024 · 1 comment
Labels
question Further information is requested

Comments

@InzamamAnwar
Copy link

  • ✋ I have searched the open/closed issues and my issue is not listed.

Please describe your question here

How to integrate Jupyterhub with Spark Operator? Tried to integrate it by installing Jupyterhub and Spark-operator but it's not working

Jupyterhub and Spark operator are running in the same namespace. Attached spark-operator service account to jupyterhub so that it can communicate with Kubernetes APIs. Creating a SparkSession through the following code,

import os
from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.appName("JupyterApp")
    .master("k8s://https://kubernetes.default.svc:443")
    .config("spark.submit.deployMode", "client")
    .config("spark.executor.instances", "1")
    .config("spark.executor.memory", "1G")
    .config("spark.driver.memory", "1G")
    .config("spark.executor.cores", "1")
    .config("spark.kubernetes.namespace", "spark-operator")
    .config(
        "spark.kubernetes.container.image", "spark:3.5.0"
    )
    .config("spark.kubernetes.authenticate.driver.serviceAccountName", "spark-operator")
    .getOrCreate()
)

The executors are being creating and killed right after they are created. Cannot see driver pod anywhere.
The error we get with the above is given below;

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
	at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:1224)
	at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:246)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:694)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:833)

Can anyone please help in this regard?

Provide a link to the example/module related to the question

Additional context

@InzamamAnwar InzamamAnwar added the question Further information is requested label Sep 18, 2024
@ha2hi
Copy link
Contributor

ha2hi commented Sep 22, 2024

Hi,

Do the Python and Spark versions of the "Jupyterhub Serve" and Image "spark:3.5.0" match?

Although written in Korean, I recently tested the "spark:3.5.0" image with Jupyter Notebook.
I hope this helps.

[url]
https://github.com/ha2hi/spark-study/tree/main/spark-on-k8s/Jupyter-Notebook
https://github.com/ha2hi/spark-study/tree/main/spark-on-k8s/Jupyter-Hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants