Skip to content

[SPARK-53176][DEPLOY] Spark launcher should respect --load-spark-defaults #51905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Aug 7, 2025

What changes were proposed in this pull request?

SPARK-48392 introduces --load-spark-defaults, but does not apply correctly for the Spark launcher process, this mainly affects the driver when Spark runs in local/client mode.

let's say we have

$ cat > conf/spark-defaults.conf <<EOF
spark.driver.memory=4g
EOF
$ cat > conf/spark-local.conf <<EOF
spark.master=local[4]
EOF
$ bin/spark-shell --properties-file conf/spark-local.conf --load-spark-defaults
...
scala> spark.sql("SET spark.driver.memory").show()
+-------------------+-----+
|                key|value|
+-------------------+-----+
|spark.driver.memory|   4g|
+-------------------+-----+

even the spark conf reports that driver uses 4GiB heap memory, but if we check the Java process, the config actually does not take effect, the default 1GiB is used instead.

$ jinfo <spark-submit-pid>
...
VM Arguments:
jvm_args: -Dscala.usejavacp=true -Xmx1g ...

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

Yes, bug fix.

How was this patch tested?

UT is modified to cover the change, plus manual tests for the above cases.

$ jinfo <spark-submit-pid>
...
VM Arguments:
jvm_args: -Dscala.usejavacp=true -Xmx4g ...

Was this patch authored or co-authored using generative AI tooling?

No.

@@ -369,8 +372,13 @@ private void testCmdBuilder(boolean isDriver, File propertiesFile) throws Except
String[] cp = findArgValue(cmd, "-cp").split(Pattern.quote(File.pathSeparator));
if (isDriver) {
assertTrue(contains("/driver", cp), "Driver classpath should contain provided entry.");
if (propertiesFile == null || loadSparkDefaults) {
assertTrue(contains("/driver-default", cp),
"Driver classpath should contain provided entry.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion fails without this patch.

@pan3793
Copy link
Member Author

pan3793 commented Aug 7, 2025

cc @sunchao @viirya @cloud-fan

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!


for (Map.Entry<Object, Object> entry : props.entrySet()) {
if (!defaultsProps.containsKey(entry.getKey())) {
defaultsProps.put(entry.getKey(), entry.getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? I think the logic should be that if user specified prop file contains any default entries, they must be overwritten. But here it looks like if the user specified entries are not in default prop file, they will be present? So it is a merger instead of overwritting? Is it correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you are right, will fix soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya I fixed the behavior, also refactored UT to verify the priority of the configs:

  • --conf
  • then the user-specified properties file
  • then spark-defaults.conf if it exists

}
}
if (loadSparkDefaults) {
assertEquals("/driver", effectiveConfig.get(SparkLauncher.DRIVER_EXTRA_CLASSPATH));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the default props contain DRIVER_EXTRA_CLASSPATH?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

private void doTestGetEffectiveConfig(
File propertiesFile, boolean loadSparkDefaults, boolean confDriverMemory) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test use loadSparkDefaults to conditionally check something? Seems no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used at line 118

assertFalse(effectiveConfig.containsKey(SparkLauncher.DRIVER_EXTRA_CLASSPATH));
}
} else {
assertEquals("/driver", effectiveConfig.get(SparkLauncher.DRIVER_EXTRA_CLASSPATH));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require loadSparkDefaults to be true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only present at spark-defaults.conf

@pan3793 pan3793 requested a review from viirya August 9, 2025 09:52
@pan3793
Copy link
Member Author

pan3793 commented Aug 11, 2025

@viirya could you take another look?

@viirya
Copy link
Member

viirya commented Aug 11, 2025

I will take another look today. Thanks.

@viirya
Copy link
Member

viirya commented Aug 11, 2025

Thanks for waiting @pan3793. Looks good to me.

@viirya
Copy link
Member

viirya commented Aug 11, 2025

@sunchao Do you want to take another look on the change after you approved?

@sunchao
Copy link
Member

sunchao commented Aug 11, 2025

Thanks @viirya . It looks good to me too!

@viirya viirya closed this in b82957c Aug 11, 2025
viirya pushed a commit that referenced this pull request Aug 11, 2025
…aults`

### What changes were proposed in this pull request?

SPARK-48392 introduces `--load-spark-defaults`, but does not apply correctly for the Spark launcher process, this mainly affects the driver when Spark runs in local/client mode.

let's say we have
```
$ cat > conf/spark-defaults.conf <<EOF
spark.driver.memory=4g
EOF
$ cat > conf/spark-local.conf <<EOF
spark.master=local[4]
EOF
```

```
$ bin/spark-shell --properties-file conf/spark-local.conf --load-spark-defaults
...
scala> spark.sql("SET spark.driver.memory").show()
+-------------------+-----+
|                key|value|
+-------------------+-----+
|spark.driver.memory|   4g|
+-------------------+-----+
```
even the spark conf reports that driver uses 4GiB heap memory, but if we check the Java process, the config actually does not take effect, the default 1GiB is used instead.
```
$ jinfo <spark-submit-pid>
...
VM Arguments:
jvm_args: -Dscala.usejavacp=true -Xmx1g ...
```

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

Yes, bug fix.

### How was this patch tested?

UT is modified to cover the change, plus manual tests for the above cases.

```
$ jinfo <spark-submit-pid>
...
VM Arguments:
jvm_args: -Dscala.usejavacp=true -Xmx4g ...
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51905 from pan3793/SPARK-53176.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit b82957c)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
@viirya
Copy link
Member

viirya commented Aug 11, 2025

Merged to master/4.0.

Thanks @pan3793 @sunchao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants