-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51267][CONNECT] Match local Spark Connect server logic between Python and Scala #50017
Conversation
|
||
// Let the server start. We will directly request to set the configurations | ||
// and this sleep makes less noisy with retries. | ||
Thread.sleep(2000L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fragile, is there a more reliable way to wait for the server to start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it's copied from the previous code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do like wait until logfile is created or sth .. I will take a seperate look it is original code in any event. Here I just added synchronized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here #50039
Do we have a place to document this overall internal workflow? There are multiple components involved here and it's unclear how the interaction happens. My understanding is
|
This is internal thing so they will have to be documented separately .. I will take a look separately.
This is documented here https://github.com/apache/spark/blob/master/docs/app-dev-spark-connect.md#spark-api-mode-spark-client-and-spark-classic |
Yea I know we have a user-facing doc, but I was asking for developer-facing code comments for people to understand how this api mode is implemented. |
Let me open a PR tmr separately |
ab67c10
to
227c613
Compare
eeb39d1
to
78d71a9
Compare
78d71a9
to
00955b7
Compare
Merged to master and branch-4.0. |
… Python and Scala This PR proposes to match local Spark Connect server logic between Python and Scala. This PR includes: 1. Synchronize the local server and terminates it on `SparkSession.stop()` in Scala 2. Remove the internal `SPARK_LOCAL_CONNECT` environment variable and `spark.local.connect` configurations, and handle them in `SparkSubmitCommandBuilder.buildSparkSubmitArgs`, and do not send `spark.remote` and `spark.api.mode` when locally running Spark Connect server. To have the consistent behaviours between Python and Scala Spark Connect. No. Manually: ``` ./bin/spark-shell --master "local" --conf spark.api.mode=connect ``` ``` ./bin/spark-shell --remote "local[*]" ``` ``` ./bin/spark-shell --master "local" --conf spark.api.mode=classic ``` ``` git clone https://github.com/HyukjinKwon/spark-connect-example cd spark-connect-example build/sbt package cd .. git clone https://github.com/apache/spark.git cd spark build/sbt package sbin/start-connect-server.sh bin/spark-submit --name "testApp" --remote "sc://localhost" --class com.hyukjinkwon.SparkConnectExample ../spark-connect-example/target/scala-2.13/spark-connect-example_2.13-0.0.1.jar ``` ``` ./bin/pyspark --master "local" --conf spark.api.mode=connect ``` ``` ./bin/pyspark --remote "local" ``` ``` ./bin/pyspark --conf spark.api.mode=classic ``` ``` ./bin/pyspark --conf spark.api.mode=connect ``` There is also an existing unittest with Yarn. No. Closes #50017 from HyukjinKwon/fix-connect-repl. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 46e12a4) Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR proposes to match local Spark Connect server logic between Python and Scala. This PR includes:
SparkSession.stop()
in ScalaSPARK_LOCAL_CONNECT
environment variable andspark.local.connect
configurations, and handle them inSparkSubmitCommandBuilder.buildSparkSubmitArgs
, and do not sendspark.remote
andspark.api.mode
when locally running Spark Connect server.Why are the changes needed?
To have the consistent behaviours between Python and Scala Spark Connect.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manually:
There is also an existing unittest with Yarn.
Was this patch authored or co-authored using generative AI tooling?
No.